In the Linux kernel, the following vulnerability has been resolved:
ceph: only d_add() negative dentries when they are unhashed
Ceph can call d_add(dentry, NULL) on a negative dentry that is already
present in the primary dcache hash.
In the current VFS that is not safe. d_add() goes through __d_add()
to __d_rehash(), which unconditionally reinserts dentry->d_hash into
the hlist_bl bucket. If the dentry is already hashed, reinserting the
same node can corrupt the bucket, including creating a self-loop.
Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
typically looping only on the d_name.hash mismatch check and
eventually triggering RCU stall reports like this one:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=25003319/25003319 fqs=829
rcu: (t=2101 jiffies g=79058445 q=698988 ncpus=192)
CPU: 87 UID: 2952868916 PID: 3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023
RIP: 0010:__d_lookup+0x46/0xb0
Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f
RSP: 0018:ff745a70c8253898 EFLAGS: 00000282
RAX: ff26e470054cb208 RBX: ff26e470054cb208 RCX: 000000006e958966
RDX: ff26e48267340000 RSI: ff745a70c82539b0 RDI: ff26e458f74655c0
RBP: 000000006e958966 R08: 0000000000000180 R09: 9cd08d909b919a89
R10: ff26e458f74655c0 R11: 0000000000000000 R12: ff26e458f74655c0
R13: ff745a70c82539b0 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f
FS: 00007f5770896980(0000) GS:ff26e482c5d88000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5764de50c0 CR3: 000000a72abb5001 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
<TASK>
lookup_fast+0x9f/0x100
walk_component+0x1f/0x150
link_path_walk+0x20e/0x3d0
path_lookupat+0x68/0x180
filename_lookup+0xdc/0x1e0
vfs_statx+0x6c/0x140
vfs_fstatat+0x67/0xa0
__do_sys_newfstatat+0x24/0x60
do_syscall_64+0x6a/0x230
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is reachable with reused cached negative dentries. A Ceph lookup
or atomic_open can be handed a negative dentry that is already hashed,
and fs/ceph/dir.c then hits one of two paths that incorrectly assume
"negative" also means "unhashed":
- ceph_finish_lookup():
MDS reply is -ENOENT with no trace
-> d_add(dentry, NULL)
- ceph_lookup():
local ENOENT fast path for a complete directory with shared caps
-> d_add(dentry, NULL)
Both paths can therefore re-add an already-hashed negative dentry.
Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
is true.
Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
negative dentry when it is actually unhashed. If the negative dentry
is already hashed, leave it in place and reuse it as-is.
This preserves the existing behavior for unhashed dentries while
avoiding d_hash list corruption for reused hashed negatives.
In the Linux kernel, the following vulnerability has been resolved:
mm: fix deferred split queue races during migration
migrate_folio_move() records the deferred split queue state from src and
replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
makes dst visible before it is requeued, so a concurrent rmap-removal path
can mark dst partially mapped and trip the WARN in deferred_split_folio().
Move the requeue before remove_migration_ptes() so dst is back on the
deferred split queue before it becomes visible again.
Because migration still holds dst locked at that point, teach
deferred_split_scan() to requeue a folio when folio_trylock() fails.
Otherwise a fully mapped underused folio can be dequeued by the shrinker
and silently lost from split_queue.
[ziy@nvidia.com: move the comment]
In the Linux kernel, the following vulnerability has been resolved:
misc: ibmasm: fix OOB MMIO read in ibmasm_handle_mouse_interrupt()
ibmasm_handle_mouse_interrupt() performs an out-of-bounds MMIO read
when the queue reader or writer index from hardware exceeds
REMOTE_QUEUE_SIZE (60).
A compromised service processor can trigger this by writing an
out-of-range value to the reader or writer MMIO register before
asserting an interrupt. Since writer is re-read from hardware on
every loop iteration, it can also be set to an out-of-range value
after the loop has already started.
The root cause is that get_queue_reader() and get_queue_writer() return
raw readl() values that are passed directly into get_queue_entry(),
which computes:
queue_begin + reader * sizeof(struct remote_input)
with no bounds check. This unchecked MMIO address is then passed to
memcpy_fromio(), reading 8 bytes from unintended device registers.
For sufficiently large values the address falls outside the PCI BAR
mapping entirely, triggering a machine check exception.
Fix by checking both indices against REMOTE_QUEUE_SIZE at the top of
the loop body, before any call to get_queue_entry(). On an out-of-range
value, reset the reader register to 0 via set_queue_reader() before
breaking, so that normal queue operation can resume if the corrupted
hardware state is transient.
In the Linux kernel, the following vulnerability has been resolved:
dm mirror: fix integer overflow in create_dirty_log()
The argument count calculation in create_dirty_log() performs
`*args_used = 2 + param_count` before validating against argc. When a
user provides a param_count close to UINT_MAX via the device mapper
table string, this unsigned addition wraps around to a small value,
causing the subsequent `argc < *args_used` check to be bypassed.
The overflowed param_count is then passed as argc to dm_dirty_log_create(),
where it can cause out-of-bounds reads on the argv array.
Fix by comparing param_count against argc - 2 before performing the
addition, following the same pattern used by parse_features() in the
same file. Since argc >= 2 is already guaranteed, the subtraction is
safe.
In the Linux kernel, the following vulnerability has been resolved:
libceph: Prevent potential null-ptr-deref in ceph_handle_auth_reply()
If a message of type CEPH_MSG_AUTH_REPLY contains a zero value for both
protocol and result, this is currently not treated as an error. In case
of ac->negotiating == true and ac->protocol > 0, this leads to setting
ac->protocol = 0 and ac->ops = NULL. Thereafter, the check for
ac->protocol != protocol returns false, and init_protocol() is not
called. Subsequently, ac->ops->handle_reply() is called, which leads to
a null pointer dereference, because ac->ops is still NULL.
This patch changes the check for ac->protocol != protocol to
!ac->protocol, as this also includes the case when the protocol was set
to zero in the message. This causes the message to be treated as
containing a bad auth protocol.
In the Linux kernel, the following vulnerability has been resolved:
KVM: SVM: Add missing save/restore handling of LBR MSRs
MSR_IA32_DEBUGCTLMSR and LBR MSRs are currently not enumerated by
KVM_GET_MSR_INDEX_LIST, and LBR MSRs cannot be set with KVM_SET_MSRS. So
save/restore is completely broken.
Fix it by adding the MSRs to msrs_to_save_base, and allowing writes to
LBR MSRs from userspace only (as they are read-only MSRs) if LBR
virtualization is enabled. Additionally, to correctly restore L1's LBRs
while L2 is running, make sure the LBRs are copied from the captured
VMCB01 save area in svm_copy_vmrun_state().
Note, for VMX, this also fixes a flaw where MSR_IA32_DEBUGCTLMSR isn't
reported as an MSR to save/restore.
Note #2, over-reporting MSR_IA32_LASTxxx on Intel is ok, as KVM already
handles unsupported reads and writes thanks to commit b5e2fec0ebc3 ("KVM:
Ignore DEBUGCTL MSRs with no effect") (kvm_do_msr_access() will morph the
unsupported userspace write into a nop).
[sean: guard with lbrv checks, massage changelog]
In the Linux kernel, the following vulnerability has been resolved:
ALSA: caiaq: Handle probe errors properly
The probe procedure of setup_card() in caiaq driver doesn't treat the
error cases gracefully, e.g. the error from snd_card_register() calls
snd_card_free() but continues. This would lead to a UAF for the
further calls like snd_usb_caiaq_control_init(), as Berk suggested in
another patch in the link below.
However, the problem is not only that; in general, this function drops
the all error handlings (as it's a void function) although its caller
can propagate an error to snd_probe(), which eventually calls
snd_card_free() as a proper error path. That said, we should treat
each error case in setup_card(), and just return the error code
promptly, which is then handled later as a fatal error in snd_probe().
This patch achieves it by changing the setup_card() to return an error
code. Also, the superfluous snd_card_free() call is removed, too.
Note that card->private_free can be set still safely at returning an
error. All called functions in card_free() have checks of the
unassigned resources or NULL checks.
In the Linux kernel, the following vulnerability has been resolved:
drm/nouveau: fix u32 overflow in pushbuf reloc bounds check
nouveau_gem_pushbuf_reloc_apply() validates each relocation with
if (r->reloc_bo_offset + 4 > nvbo->bo.base.size)
but reloc_bo_offset is __u32 (uapi/drm/nouveau_drm.h) and the integer
literal 4 promotes to unsigned int, so the addition is performed in 32
bits and wraps before the comparison against the size_t bo size.
Cast to u64 so the addition happens in 64-bit arithmetic.
[ Add Fixes: tag. - Danilo ]
In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
The security operations that verify the RESPONSE packets decrypt bits of it
in place - however, the sk_buff may be shared with a packet sniffer, which
would lead to the sniffer seeing an apparently corrupt packet (actually
decrypted).
Fix this by handing a copy of the packet off to the specific security
handler if the packet was cloned.
In the Linux kernel, the following vulnerability has been resolved:
ext2: reject inodes with zero i_nlink and valid mode in ext2_iget()
ext2_iget() already rejects inodes with i_nlink == 0 when i_mode is
zero or i_dtime is set, treating them as deleted. However, the case of
i_nlink == 0 with a non-zero mode and zero dtime slips through. Since
ext2 has no orphan list, such a combination can only result from
filesystem corruption - a legitimate inode deletion always sets either
i_dtime or clears i_mode before freeing the inode.
A crafted image can exploit this gap to present such an inode to the
VFS, which then triggers WARN_ON inside drop_nlink() (fs/inode.c) via
ext2_unlink(), ext2_rename() and ext2_rmdir():
WARNING: CPU: 3 PID: 609 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 3 UID: 0 PID: 609 Comm: syz-executor Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_unlink+0x26c/0x300 fs/ext2/namei.c:295
vfs_unlink+0x2fc/0x9b0 fs/namei.c:4477
do_unlinkat+0x53e/0x730 fs/namei.c:4541
__x64_sys_unlink+0xc6/0x110 fs/namei.c:4587
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
WARNING: CPU: 0 PID: 646 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 0 UID: 0 PID: 646 Comm: syz.0.17 Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_rename+0x35e/0x850 fs/ext2/namei.c:374
vfs_rename+0xf2f/0x2060 fs/namei.c:5021
do_renameat2+0xbe2/0xd50 fs/namei.c:5178
__x64_sys_rename+0x7e/0xa0 fs/namei.c:5223
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
WARNING: CPU: 0 PID: 634 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 0 UID: 0 PID: 634 Comm: syz-executor Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_rmdir+0xca/0x110 fs/ext2/namei.c:311
vfs_rmdir+0x204/0x690 fs/namei.c:4348
do_rmdir+0x372/0x3e0 fs/namei.c:4407
__x64_sys_unlinkat+0xf0/0x130 fs/namei.c:4577
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Extend the existing i_nlink == 0 check to also catch this case,
reporting the corruption via ext2_error() and returning -EFSCORRUPTED.
This rejects the inode at load time and prevents it from reaching any
of the namei.c paths.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.