In the Linux kernel, the following vulnerability has been resolved:
KVM: SEV: Reject attempts to sync VMSA of an already-launched/encrypted vCPU
Reject synchronizing vCPU state to its associated VMSA if the vCPU has
already been launched, i.e. if the VMSA has already been encrypted. On a
host with SNP enabled, accessing guest-private memory generates an RMP #PF
and panics the host.
BUG: unable to handle page fault for address: ff1276cbfdf36000
#PF: supervisor write access in kernel mode
#PF: error_code(0x80000003) - RMP violation
PGD 5a31801067 P4D 5a31802067 PUD 40ccfb5063 PMD 40e5954063 PTE 80000040fdf36163
SEV-SNP: PFN 0x40fdf36, RMP entry: [0x6010fffffffff001 - 0x000000000000001f]
Oops: Oops: 0003 [#1] SMP NOPTI
CPU: 33 UID: 0 PID: 996180 Comm: qemu-system-x86 Tainted: G OE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R7625/0H1TJT, BIOS 1.5.8 07/21/2023
RIP: 0010:sev_es_sync_vmsa+0x54/0x4c0 [kvm_amd]
Call Trace:
<TASK>
snp_launch_update_vmsa+0x19d/0x290 [kvm_amd]
snp_launch_finish+0xb6/0x380 [kvm_amd]
sev_mem_enc_ioctl+0x14e/0x720 [kvm_amd]
kvm_arch_vm_ioctl+0x837/0xcf0 [kvm]
kvm_vm_ioctl+0x3fd/0xcc0 [kvm]
__x64_sys_ioctl+0xa3/0x100
x64_sys_call+0xfe0/0x2350
do_syscall_64+0x81/0x10f0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7ffff673287d
</TASK>
Note, the KVM flaw has been present since commit ad73109ae7ec ("KVM: SVM:
Provide support to launch and run an SEV-ES guest"), but has only been
actively dangerous for the host since SNP support was added. With SEV-ES,
KVM would "just" clobber guest state, which is totally fine from a host
kernel perspective since userspace can clobber guest state any time before
sev_launch_update_vmsa().
In the Linux kernel, the following vulnerability has been resolved:
mm: call ->free_folio() directly in folio_unmap_invalidate()
We can only call filemap_free_folio() if we have a reference to (or hold a
lock on) the mapping. Otherwise, we've already removed the folio from the
mapping so it no longer pins the mapping and the mapping can be removed,
causing a use-after-free when accessing mapping->a_ops.
Follow the same pattern as __remove_mapping() and load the free_folio
function pointer before dropping the lock on the mapping. That lets us
make filemap_free_folio() static as this was the only caller outside
filemap.c.
In the Linux kernel, the following vulnerability has been resolved:
media: em28xx: fix use-after-free in em28xx_v4l2_open()
em28xx_v4l2_open() reads dev->v4l2 without holding dev->lock,
creating a race with em28xx_v4l2_init()'s error path and
em28xx_v4l2_fini(), both of which free the em28xx_v4l2 struct
and set dev->v4l2 to NULL under dev->lock.
This race leads to two issues:
- use-after-free in v4l2_fh_init() when accessing vdev->ctrl_handler,
since the video_device is embedded in the freed em28xx_v4l2 struct.
- NULL pointer dereference in em28xx_resolution_set() when accessing
v4l2->norm, since dev->v4l2 has been set to NULL.
Fix this by moving the mutex_lock() before the dev->v4l2 read and
adding a NULL check for dev->v4l2 under the lock.
In the Linux kernel, the following vulnerability has been resolved:
media: mediatek: vcodec: fix use-after-free in encoder release path
The fops_vcodec_release() function frees the context structure (ctx)
without first cancelling any pending or running work in ctx->encode_work.
This creates a race window where the workqueue handler (mtk_venc_worker)
may still be accessing the context memory after it has been freed.
Race condition:
CPU 0 (release path) CPU 1 (workqueue)
--------------------- ------------------
fops_vcodec_release()
v4l2_m2m_ctx_release()
v4l2_m2m_cancel_job()
// waits for m2m job "done"
mtk_venc_worker()
v4l2_m2m_job_finish()
// m2m job "done"
// BUT worker still running!
// post-job_finish access:
other ctx dereferences
// UAF if ctx already freed
// returns (job "done")
kfree(ctx) // ctx freed
Root cause: The v4l2_m2m_ctx_release() only waits for the m2m job
lifecycle (via TRANS_RUNNING flag), not the workqueue lifecycle.
After v4l2_m2m_job_finish() is called, the m2m framework considers
the job complete and v4l2_m2m_ctx_release() returns, but the worker
function continues executing and may still access ctx.
The work is queued during encode operations via:
queue_work(ctx->dev->encode_workqueue, &ctx->encode_work)
The worker function accesses ctx->m2m_ctx, ctx->dev, and other ctx
fields even after calling v4l2_m2m_job_finish().
This vulnerability was confirmed with KASAN by running an instrumented
test module that widens the post-job_finish race window. KASAN detected:
BUG: KASAN: slab-use-after-free in mtk_venc_worker+0x159/0x180
Read of size 4 at addr ffff88800326e000 by task kworker/u8:0/12
Workqueue: mtk_vcodec_enc_wq mtk_venc_worker
Allocated by task 47:
__kasan_kmalloc+0x7f/0x90
fops_vcodec_open+0x85/0x1a0
Freed by task 47:
__kasan_slab_free+0x43/0x70
kfree+0xee/0x3a0
fops_vcodec_release+0xb7/0x190
Fix this by calling cancel_work_sync(&ctx->encode_work) before kfree(ctx).
This ensures the workqueue handler is both cancelled (if pending) and
synchronized (waits for any running handler to complete) before the
context is freed.
Placement rationale: The fix is placed after v4l2_ctrl_handler_free()
and before list_del_init(&ctx->list). At this point, all m2m operations
are done (v4l2_m2m_ctx_release() has returned), and we need to ensure
the workqueue is synchronized before removing ctx from the list and
freeing it.
Note: The open error path does NOT need cancel_work_sync() because
INIT_WORK() only initializes the work structure - it does not schedule
it. Work is only scheduled later during device_run() operations.
In the Linux kernel, the following vulnerability has been resolved:
media: vidtv: fix nfeeds state corruption on start_streaming failure
syzbot reported a memory leak in vidtv_psi_service_desc_init [1].
When vidtv_start_streaming() fails inside vidtv_start_feed(), the
nfeeds counter is left incremented even though no feed was actually
started. This corrupts the driver state: subsequent start_feed calls
see nfeeds > 1 and skip starting the mux, while stop_feed calls
eventually try to stop a non-existent stream.
This state corruption can also lead to memory leaks, since the mux
and channel resources may be partially allocated during a failed
start_streaming but never cleaned up, as the stop path finds
dvb->streaming == false and returns early.
Fix by decrementing nfeeds back when start_streaming fails, keeping
the counter in sync with the actual number of active feeds.
[1]
BUG: memory leak
unreferenced object 0xffff888145b50820 (size 32):
comm "syz.0.17", pid 6068, jiffies 4294944486
backtrace (crc 90a0c7d4):
vidtv_psi_service_desc_init+0x74/0x1b0 drivers/media/test-drivers/vidtv/vidtv_psi.c:288
vidtv_channel_s302m_init+0xb1/0x2a0 drivers/media/test-drivers/vidtv/vidtv_channel.c:83
vidtv_channels_init+0x1b/0x40 drivers/media/test-drivers/vidtv/vidtv_channel.c:524
vidtv_mux_init+0x516/0xbe0 drivers/media/test-drivers/vidtv/vidtv_mux.c:518
vidtv_start_streaming drivers/media/test-drivers/vidtv/vidtv_bridge.c:194 [inline]
vidtv_start_feed+0x33e/0x4d0 drivers/media/test-drivers/vidtv/vidtv_bridge.c:239
In the Linux kernel, the following vulnerability has been resolved:
mm: blk-cgroup: fix use-after-free in cgwb_release_workfn()
cgwb_release_workfn() calls css_put(wb->blkcg_css) and then later accesses
wb->blkcg_css again via blkcg_unpin_online(). If css_put() drops the last
reference, the blkcg can be freed asynchronously (css_free_rwork_fn ->
blkcg_css_free -> kfree) before blkcg_unpin_online() dereferences the
pointer to access blkcg->online_pin, resulting in a use-after-free:
BUG: KASAN: slab-use-after-free in blkcg_unpin_online (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 block/blk-cgroup.c:1367)
Write of size 4 at addr ff11000117aa6160 by task kworker/71:1/531
Workqueue: cgwb_release cgwb_release_workfn
Call Trace:
<TASK>
blkcg_unpin_online (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 block/blk-cgroup.c:1367)
cgwb_release_workfn (mm/backing-dev.c:629)
process_scheduled_works (kernel/workqueue.c:3278 kernel/workqueue.c:3385)
Freed by task 1016:
kfree (./include/linux/kasan.h:235 mm/slub.c:2689 mm/slub.c:6246 mm/slub.c:6561)
css_free_rwork_fn (kernel/cgroup/cgroup.c:5542)
process_scheduled_works (kernel/workqueue.c:3302 kernel/workqueue.c:3385)
** Stack based on commit 66672af7a095 ("Add linux-next specific files
for 20260410")
I am seeing this crash sporadically in Meta fleet across multiple kernel
versions. A full reproducer is available at:
https://github.com/leitao/debug/blob/main/reproducers/repro_blkcg_uaf.sh
(The race window is narrow. To make it easily reproducible, inject a
msleep(100) between css_put() and blkcg_unpin_online() in
cgwb_release_workfn(). With that delay and a KASAN-enabled kernel, the
reproducer triggers the splat reliably in less than a second.)
Fix this by moving blkcg_unpin_online() before css_put(), so the
cgwb's CSS reference keeps the blkcg alive while blkcg_unpin_online()
accesses it.
In the Linux kernel, the following vulnerability has been resolved:
ASoC: qcom: q6apm: move component registration to unmanaged version
q6apm component registers dais dynamically from ASoC toplology, which
are allocated using device managed version apis. Allocating both
component and dynamic dais using managed version could lead to incorrect
free ordering, dai will be freed while component still holding references
to it.
Fix this issue by moving component to unmanged version so
that the dai pointers are only freeded after the component is removed.
==================================================================
BUG: KASAN: slab-use-after-free in snd_soc_del_component_unlocked+0x3d4/0x400 [snd_soc_core]
Read of size 8 at addr ffff00084493a6e8 by task kworker/u48:0/3426
Tainted: [W]=WARN
Hardware name: LENOVO 21N2ZC5PUS/21N2ZC5PUS, BIOS N42ET57W (1.31 ) 08/08/2024
Workqueue: pdr_notifier_wq pdr_notifier_work [pdr_interface]
Call trace:
show_stack+0x28/0x7c (C)
dump_stack_lvl+0x60/0x80
print_report+0x160/0x4b4
kasan_report+0xac/0xfc
__asan_report_load8_noabort+0x20/0x34
snd_soc_del_component_unlocked+0x3d4/0x400 [snd_soc_core]
snd_soc_unregister_component_by_driver+0x50/0x88 [snd_soc_core]
devm_component_release+0x30/0x5c [snd_soc_core]
devres_release_all+0x13c/0x210
device_unbind_cleanup+0x20/0x190
device_release_driver_internal+0x350/0x468
device_release_driver+0x18/0x30
bus_remove_device+0x1a0/0x35c
device_del+0x314/0x7f0
device_unregister+0x20/0xbc
apr_remove_device+0x5c/0x7c [apr]
device_for_each_child+0xd8/0x160
apr_pd_status+0x7c/0xa8 [apr]
pdr_notifier_work+0x114/0x240 [pdr_interface]
process_one_work+0x500/0xb70
worker_thread+0x630/0xfb0
kthread+0x370/0x6c0
ret_from_fork+0x10/0x20
Allocated by task 77:
kasan_save_stack+0x40/0x68
kasan_save_track+0x20/0x40
kasan_save_alloc_info+0x44/0x58
__kasan_kmalloc+0xbc/0xdc
__kmalloc_node_track_caller_noprof+0x1f4/0x620
devm_kmalloc+0x7c/0x1c8
snd_soc_register_dai+0x50/0x4f0 [snd_soc_core]
soc_tplg_pcm_elems_load+0x55c/0x1eb8 [snd_soc_core]
snd_soc_tplg_component_load+0x4f8/0xb60 [snd_soc_core]
audioreach_tplg_init+0x124/0x1fc [snd_q6apm]
q6apm_audio_probe+0x10/0x1c [snd_q6apm]
snd_soc_component_probe+0x5c/0x118 [snd_soc_core]
soc_probe_component+0x44c/0xaf0 [snd_soc_core]
snd_soc_bind_card+0xad0/0x2370 [snd_soc_core]
snd_soc_register_card+0x3b0/0x4c0 [snd_soc_core]
devm_snd_soc_register_card+0x50/0xc8 [snd_soc_core]
x1e80100_platform_probe+0x208/0x368 [snd_soc_x1e80100]
platform_probe+0xc0/0x188
really_probe+0x188/0x804
__driver_probe_device+0x158/0x358
driver_probe_device+0x60/0x190
__device_attach_driver+0x16c/0x2a8
bus_for_each_drv+0x100/0x194
__device_attach+0x174/0x380
device_initial_probe+0x14/0x20
bus_probe_device+0x124/0x154
deferred_probe_work_func+0x140/0x220
process_one_work+0x500/0xb70
worker_thread+0x630/0xfb0
kthread+0x370/0x6c0
ret_from_fork+0x10/0x20
Freed by task 3426:
kasan_save_stack+0x40/0x68
kasan_save_track+0x20/0x40
__kasan_save_free_info+0x4c/0x80
__kasan_slab_free+0x78/0xa0
kfree+0x100/0x4a4
devres_release_all+0x144/0x210
device_unbind_cleanup+0x20/0x190
device_release_driver_internal+0x350/0x468
device_release_driver+0x18/0x30
bus_remove_device+0x1a0/0x35c
device_del+0x314/0x7f0
device_unregister+0x20/0xbc
apr_remove_device+0x5c/0x7c [apr]
device_for_each_child+0xd8/0x160
apr_pd_status+0x7c/0xa8 [apr]
pdr_notifier_work+0x114/0x240 [pdr_interface]
process_one_work+0x500/0xb70
worker_thread+0x630/0xfb0
kthread+0x370/0x6c0
ret_from_fork+0x10/0x20
In the Linux kernel, the following vulnerability has been resolved:
KVM: x86: Use scratch field in MMIO fragment to hold small write values
When exiting to userspace to service an emulated MMIO write, copy the
to-be-written value to a scratch field in the MMIO fragment if the size
of the data payload is 8 bytes or less, i.e. can fit in a single chunk,
instead of pointing the fragment directly at the source value.
This fixes a class of use-after-free bugs that occur when the emulator
initiates a write using an on-stack, local variable as the source, the
write splits a page boundary, *and* both pages are MMIO pages. Because
KVM's ABI only allows for physically contiguous MMIO requests, accesses
that split MMIO pages are separated into two fragments, and are sent to
userspace one at a time. When KVM attempts to complete userspace MMIO in
response to KVM_RUN after the first fragment, KVM will detect the second
fragment and generate a second userspace exit, and reference the on-stack
variable.
The issue is most visible if the second KVM_RUN is performed by a separate
task, in which case the stack of the initiating task can show up as truly
freed data.
==================================================================
BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420
Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984
CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace:
dump_stack+0xbe/0xfd
print_address_description.constprop.0+0x19/0x170
__kasan_report.cold+0x6c/0x84
kasan_report+0x3a/0x50
check_memory_region+0xfd/0x1f0
memcpy+0x20/0x60
complete_emulated_mmio+0x305/0x420
kvm_arch_vcpu_ioctl_run+0x63f/0x6d0
kvm_vcpu_ioctl+0x413/0xb20
__se_sys_ioctl+0x111/0x160
do_syscall_64+0x30/0x40
entry_SYSCALL_64_after_hwframe+0x67/0xd1
RIP: 0033:0x42477d
Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c
R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720
The buggy address belongs to the page:
page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
The bug can also be reproduced with a targeted KVM-Unit-Test by hacking
KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by
overwrite the data value with garbage.
Limit the use of the scratch fields to 8-byte or smaller accesses, and to
just writes, as larger accesses and reads are not affected thanks to
implementation details in the emulator, but add a sanity check to ensure
those details don't change in the future. Specifically, KVM never uses
on-stack variables for accesses larger that 8 bytes, e.g. uses an operand
in the emulator context, and *al
---truncated---
In the Linux kernel, the following vulnerability has been resolved:
clockevents: Add missing resets of the next_event_forced flag
The prevention mechanism against timer interrupt starvation missed to reset
the next_event_forced flag in a couple of places:
- When the clock event state changes. That can cause the flag to be
stale over a shutdown/startup sequence
- When a non-forced event is armed, which then prevents rearming before
that event. If that event is far out in the future this will cause
missed timer interrupts.
- In the suspend wakeup handler.
That led to stalls which have been reported by several people.
Add the missing resets, which fixes the problems for the reporters.
In the Linux kernel, the following vulnerability has been resolved:
mm/userfaultfd: fix hugetlb fault mutex hash calculation
In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the
page index for hugetlb_fault_mutex_hash(). However, linear_page_index()
returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash()
expects the index in huge page units. This mismatch means that different
addresses within the same huge page can produce different hash values,
leading to the use of different mutexes for the same huge page. This can
cause races between faulting threads, which can corrupt the reservation
map and trigger the BUG_ON in resv_map_release().
Fix this by introducing hugetlb_linear_page_index(), which returns the
page index in huge page granularity, and using it in place of
linear_page_index().