In the Linux kernel, the following vulnerability has been resolved:
xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
KASAN reproduces a slab-use-after-free in __xfrm_state_delete()'s
hlist_del_rcu calls under syzkaller load on linux-6.12.y stable
(reproduced on 6.12.47, also reachable via the same code path on
torvalds/master and on the ipsec tree). Nine unique signatures cluster
in the xfrm_state lifecycle, the load-bearing one being:
BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
BUG: KASAN: slab-use-after-free in hlist_del_rcu include/linux/rculist.h:516 [inline]
BUG: KASAN: slab-use-after-free in __xfrm_state_delete net/xfrm/xfrm_state.c
Write of size 8 at addr ffff8881198bcb70 by task kworker/u8:9/435
Workqueue: netns cleanup_net
Call Trace:
__hlist_del / hlist_del_rcu
__xfrm_state_delete
xfrm_state_delete
xfrm_state_flush
xfrm_state_fini
ops_exit_list
cleanup_net
The other observed signatures hit the same slab object from
__xfrm_state_lookup, xfrm_alloc_spi, __xfrm_state_insert and an OOB
write variant of __xfrm_state_delete, all on the byseq/byspi
hash chains.
__xfrm_state_delete() guards its byseq and byspi unhashes with
value-based predicates:
if (x->km.seq)
hlist_del_rcu(&x->byseq);
if (x->id.spi)
hlist_del_rcu(&x->byspi);
while everywhere else in the file (e.g. state_cache, state_cache_input)
the safer hlist_unhashed() check is used. xfrm_alloc_spi() sets
x->id.spi = newspi inside xfrm_state_lock and then immediately inserts
into byspi, but a path that observes x->id.spi != 0 outside of
xfrm_state_lock can still skip-or-hit the byspi unhash inconsistently
with whether x is actually on the list. The same holds for x->km.seq
versus byseq, and the bydst/bysrc unhashes have no predicate at all,
so a second __xfrm_state_delete() on the same object writes through
LIST_POISON pprev.
The defensive change here:
- Use hlist_del_init_rcu() instead of hlist_del_rcu() on bydst,
bysrc, byseq and byspi so a second deletion is a no-op rather
than a write through LIST_POISON pprev. The byseq/byspi nodes
are already initialised in xfrm_state_alloc().
- Test hlist_unhashed() rather than the value predicate for
byseq/byspi, so the unhash decision tracks list state rather than
mutable scalar fields.
Empirical verification: applied this patch on top of v6.12.47, rebuilt,
and re-ran the same syzkaller harness for 1h16m on a previously-crashy
configuration that produced ~100 hits each of slab-use-after-free
Read in xfrm_alloc_spi / Read in __xfrm_state_lookup / Write in
__xfrm_state_delete. After the patch, 7.1M execs across 32 VMs at
~1550 exec/sec produced zero xfrm_state UAF/OOB hits. /proc/slabinfo
confirms the xfrm_state slab is actively allocated and freed during
the run (~143 KiB resident), so the fuzzer is still exercising those
code paths -- they just no longer crash.
Reproduction:
- Linux 6.12.47 x86_64 + KASAN_GENERIC + KASAN_INLINE + KCOV
- syzkaller @ 746545b8b1e4c3a128db8652b340d3df90ce61db
- 32 QEMU/KVM VMs x 2 vCPU on AWS c5.metal bare metal
- 9 unique signatures collected in ~9h, all within xfrm_state
lifecycle
In the Linux kernel, the following vulnerability has been resolved:
libceph: Fix slab-out-of-bounds access in auth message processing
If a (potentially corrupted) message of type CEPH_MSG_AUTH_REPLY
contains a positive value in its result field, it is treated as an
error code by ceph_handle_auth_reply() and returned to
handle_auth_reply(). Thereafter, an attempt is made to send the
preallocated message of type CEPH_MSG_AUTH, where the returned value is
interpreted as the size of the front segment to send. If the result
value in the message is greater than the size of the memory buffer
allocated for the front segment, an out-of-bounds access occurs, and
the content of the memory region beyond this buffer is sent out.
This patch fixes the issue by treating only negative values in the
result field as errors. Positive values are therefore treated as success
in the same way as a zero value. Additionally, a BUG_ON is added to
__send_prepared_auth_request() comparing the len parameter to
front_alloc_len to prevent sending the message if it exceeds the bounds
of the allocation and to make it easier to catch any logic flaws leading
to this.
In the Linux kernel, the following vulnerability has been resolved:
wifi: b43: enforce bounds check on firmware key index in b43_rx()
The firmware-controlled key index in b43_rx() can exceed the dev->key[]
array size (58 entries). The existing B43_WARN_ON is non-enforcing in
production builds, allowing an out-of-bounds read.
Make the B43_WARN_ON check enforcing by dropping the frame when the
firmware returns an invalid key index.
In the Linux kernel, the following vulnerability has been resolved:
KVM: x86: Fix shadow paging use-after-free due to unexpected GFN
The shadow MMU computes GFNs for direct shadow pages using sp->gfn plus
the SPTE index. This assumption breaks for shadow paging if the guest
page tables are modified between VM entries (similar to commit
aad885e77496, "KVM: x86/mmu: Drop/zap existing present SPTE even
when creating an MMIO SPTE", 2026-03-27). The flow is as follows:
- a PDE is installed for a 2MB mapping, and a page in that area is
accessed. KVM creates a kvm_mmu_page consisting of 512 4KB pages;
the kvm_mmu_page is marked by FNAME(fetch) as direct-mapped because
the guest's mapping is a huge page (and thus contiguous).
- the PDE mapping is changed from outside the guest.
- the guest accesses another page in the same 2MB area. KVM installs
a new leaf SPTE and rmap entry; the SPTE uses the "correct" GFN
(i.e. based on the new mapping, as changed in the previous step) but
that GFN is outside of the [sp->gfn, sp->gfn + 511] range; therefore
the rmap entry cannot be found and removed when the kvm_mmu_page
is zapped.
- the memslot that covers the first 2MB mapping is deleted, and the
kvm_mmu_page for the now-invalid GPA is zapped. However, rmap_remove()
only looks at the [sp->gfn, sp->gfn + 511] range established in step 1,
and fails to find the rmap entry that was recorded by step 3.
- any operation that causes an rmap walk for the same page accessed
by step 3 then walks a stale rmap and dereferences a freed kvm_mmu_page.
This includes dirty logging or MMU notifier invalidations (e.g., from
MADV_DONTNEED).
The underlying issue is that KVM's walking of shadow PTEs assumes that
if a SPTE is present when KVM wants to install a non-leaf SPTE, then the
existing kvm_mmu_page must be for the correct gfn. Because the only way
for the gfn to be wrong is if KVM messed up and failed to zap a SPTE...
which shouldn't happen, but *actually* only happens in response to a
guest write.
That bug dates back literally forever, as even the first version of KVM
assumes that the GFN matches and walks into the "wrong" shadow page.
However, that was only an imprecision until 2032a93d66fa ("KVM: MMU:
Don't allocate gfns page for direct mmu pages") came along.
Fix it by checking for a target gfn mismatch and zapping the existing
SPTE. That way the old SP and rmap entries are gone, KVM installs
the rmap in the right location, and everyone is happy.
In the Linux kernel, the following vulnerability has been resolved:
ipmi:si: Return state to normal if message allocation fails
There were places where nothing would get started if a message
allocation failed, so the driver needs to return to normal state.
In the Linux kernel, the following vulnerability has been resolved:
KVM: SVM: Inject #UD for INVLPGA if EFER.SVME=0
INVLPGA should cause a #UD when EFER.SVME is not set. Add a check to
properly inject #UD when EFER.SVME=0.
[sean: tag for stable@]
In the Linux kernel, the following vulnerability has been resolved:
fbdev: defio: Disconnect deferred I/O from the lifetime of struct fb_info
Hold state of deferred I/O in struct fb_deferred_io_state. Allocate an
instance as part of initializing deferred I/O and remove it only after
the final mapping has been closed. If the fb_info and the contained
deferred I/O meanwhile goes away, clear struct fb_deferred_io_state.info
to invalidate the mapping. Any access will then result in a SIGBUS
signal.
Fixes a long-standing problem, where a device hot-unplug happens while
user space still has an active mapping of the graphics memory. The hot-
unplug frees the instance of struct fb_info. Accessing the memory will
operate on undefined state.
In the Linux kernel, the following vulnerability has been resolved:
ibmasm: fix heap over-read in ibmasm_send_i2o_message()
The ibmasm_send_i2o_message() function uses get_dot_command_size() to
compute the byte count for memcpy_toio(), but this value is derived from
user-controlled fields in the dot_command_header (command_size: u8,
data_size: u16) and is never validated against the actual allocation size.
A root user can write a small buffer with inflated header fields, causing
memcpy_toio() to read up to ~65 KB past the end of the allocation into
adjacent kernel heap, which is then forwarded to the service processor
over MMIO.
Silently clamping the copy size is not sufficient: if the header fields
claim a larger size than the buffer, the SP receives a dot command whose
own header is inconsistent with the I2O message length, which can cause
the SP to desynchronize. Reject such commands outright by returning
failure.
Validate command_size before calling get_mfa_inbound() to avoid leaking
an I2O message frame: reading INBOUND_QUEUE_PORT dequeues a hardware
frame from the controller's free pool, and returning without a
corresponding set_mfa_inbound() call would permanently exhaust it.
Additionally, clamp command_size to I2O_COMMAND_SIZE before the
memcpy_toio() so the MMIO write stays within the I2O message frame,
consistent with the clamping already performed by outgoing_message_size()
for the header field.
In the Linux kernel, the following vulnerability has been resolved:
ALSA: ctxfi: Add fallback to default RSR for S/PDIF
spdif_passthru_playback_get_resources() uses atc->pll_rate as the RSR
for the MSR calculation loop. However, pll_rate is only updated in
atc_pll_init() and not in hw_pll_init(), so it remains 0 after the
card init.
When spdif_passthru_playback_setup() skips atc_pll_init() for
32000 Hz, (rsr * desc.msr) always becomes 0, causing the loop to spin
indefinitely.
Add fallback to use atc->rsr when atc->pll_rate is 0. This reflects
the hardware state, since hw_card_init() already configures the PLL
to the default RSR.
In the Linux kernel, the following vulnerability has been resolved:
ceph: only d_add() negative dentries when they are unhashed
Ceph can call d_add(dentry, NULL) on a negative dentry that is already
present in the primary dcache hash.
In the current VFS that is not safe. d_add() goes through __d_add()
to __d_rehash(), which unconditionally reinserts dentry->d_hash into
the hlist_bl bucket. If the dentry is already hashed, reinserting the
same node can corrupt the bucket, including creating a self-loop.
Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
typically looping only on the d_name.hash mismatch check and
eventually triggering RCU stall reports like this one:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=25003319/25003319 fqs=829
rcu: (t=2101 jiffies g=79058445 q=698988 ncpus=192)
CPU: 87 UID: 2952868916 PID: 3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023
RIP: 0010:__d_lookup+0x46/0xb0
Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f
RSP: 0018:ff745a70c8253898 EFLAGS: 00000282
RAX: ff26e470054cb208 RBX: ff26e470054cb208 RCX: 000000006e958966
RDX: ff26e48267340000 RSI: ff745a70c82539b0 RDI: ff26e458f74655c0
RBP: 000000006e958966 R08: 0000000000000180 R09: 9cd08d909b919a89
R10: ff26e458f74655c0 R11: 0000000000000000 R12: ff26e458f74655c0
R13: ff745a70c82539b0 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f
FS: 00007f5770896980(0000) GS:ff26e482c5d88000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5764de50c0 CR3: 000000a72abb5001 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
<TASK>
lookup_fast+0x9f/0x100
walk_component+0x1f/0x150
link_path_walk+0x20e/0x3d0
path_lookupat+0x68/0x180
filename_lookup+0xdc/0x1e0
vfs_statx+0x6c/0x140
vfs_fstatat+0x67/0xa0
__do_sys_newfstatat+0x24/0x60
do_syscall_64+0x6a/0x230
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is reachable with reused cached negative dentries. A Ceph lookup
or atomic_open can be handed a negative dentry that is already hashed,
and fs/ceph/dir.c then hits one of two paths that incorrectly assume
"negative" also means "unhashed":
- ceph_finish_lookup():
MDS reply is -ENOENT with no trace
-> d_add(dentry, NULL)
- ceph_lookup():
local ENOENT fast path for a complete directory with shared caps
-> d_add(dentry, NULL)
Both paths can therefore re-add an already-hashed negative dentry.
Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
is true.
Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
negative dentry when it is actually unhashed. If the negative dentry
is already hashed, leave it in place and reuse it as-is.
This preserves the existing behavior for unhashed dentries while
avoiding d_hash list corruption for reused hashed negatives.