In the Linux kernel, the following vulnerability has been resolved:
powerpc/set_memory: Avoid spinlock recursion in change_page_attr()
Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
included a spin_lock() to change_page_attr() in order to
safely perform the three step operations. But then
commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against
concurrent accesses") modify it to use pte_update() and do
the operation safely against concurrent access.
In the meantime, Maxime reported some spinlock recursion.
[ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217
[ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0
[ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523
[ 15.373350] Workqueue: events do_free_init
[ 15.377615] Call Trace:
[ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable)
[ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4
[ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310
[ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0
[ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8
[ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94
[ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310
[ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134
[ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8
[ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c
[ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8
[ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94
[ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8
[ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8
[ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210
[ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c
Remove the read / modify / write sequence to make the operation atomic
and remove the spin_lock() in change_page_attr().
To do the operation atomically, we can't use pte modification helpers
anymore. Because all platforms have different combination of bits, it
is not easy to use those bits directly. But all have the
_PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare
two sets to know which bits are set or cleared.
For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you
know which bit gets cleared and which bit get set when changing exec
permission.
In the Linux kernel, the following vulnerability has been resolved:
net: sched: Disallow replacing of child qdisc from one parent to another
Lion Ackermann was able to create a UAF which can be abused for privilege
escalation with the following script
Step 1. create root qdisc
tc qdisc add dev lo root handle 1:0 drr
step2. a class for packet aggregation do demonstrate uaf
tc class add dev lo classid 1:1 drr
step3. a class for nesting
tc class add dev lo classid 1:2 drr
step4. a class to graft qdisc to
tc class add dev lo classid 1:3 drr
step5.
tc qdisc add dev lo parent 1:1 handle 2:0 plug limit 1024
step6.
tc qdisc add dev lo parent 1:2 handle 3:0 drr
step7.
tc class add dev lo classid 3:1 drr
step 8.
tc qdisc add dev lo parent 3:1 handle 4:0 pfifo
step 9. Display the class/qdisc layout
tc class ls dev lo
class drr 1:1 root leaf 2: quantum 64Kb
class drr 1:2 root leaf 3: quantum 64Kb
class drr 3:1 root leaf 4: quantum 64Kb
tc qdisc ls
qdisc drr 1: dev lo root refcnt 2
qdisc plug 2: dev lo parent 1:1
qdisc pfifo 4: dev lo parent 3:1 limit 1000p
qdisc drr 3: dev lo parent 1:2
step10. trigger the bug <=== prevented by this patch
tc qdisc replace dev lo parent 1:3 handle 4:0
step 11. Redisplay again the qdiscs/classes
tc class ls dev lo
class drr 1:1 root leaf 2: quantum 64Kb
class drr 1:2 root leaf 3: quantum 64Kb
class drr 1:3 root leaf 4: quantum 64Kb
class drr 3:1 root leaf 4: quantum 64Kb
tc qdisc ls
qdisc drr 1: dev lo root refcnt 2
qdisc plug 2: dev lo parent 1:1
qdisc pfifo 4: dev lo parent 3:1 refcnt 2 limit 1000p
qdisc drr 3: dev lo parent 1:2
Observe that a) parent for 4:0 does not change despite the replace request.
There can only be one parent. b) refcount has gone up by two for 4:0 and
c) both class 1:3 and 3:1 are pointing to it.
Step 12. send one packet to plug
echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10001))
step13. send one packet to the grafted fifo
echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10003))
step14. lets trigger the uaf
tc class delete dev lo classid 1:3
tc class delete dev lo classid 1:1
The semantics of "replace" is for a del/add _on the same node_ and not
a delete from one node(3:1) and add to another node (1:3) as in step10.
While we could "fix" with a more complex approach there could be
consequences to expectations so the patch takes the preventive approach of
"disallow such config".
Joint work with Lion Ackermann <nnamrec@gmail.com>
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Truncate an inode's address space when flipping the GFS2_DIF_JDATA flag:
depending on that flag, the pages in the address space will either use
buffer heads or iomap_folio_state structs, and we cannot mix the two.
In the Linux kernel, the following vulnerability has been resolved:
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
If there's a persistent error in the hypervisor, the SCSI warning for
failed I/O can flood the kernel log and max out CPU utilization,
preventing troubleshooting from the VM side. Ratelimit the warning so
it doesn't DoS the VM.
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Initialize denominator defaults to 1
[WHAT & HOW]
Variables, used as denominators and maybe not assigned to other values,
should be initialized to non-zero to avoid DIVIDE_BY_ZERO, as reported
by Coverity.
(cherry picked from commit e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7)
In the Linux kernel, the following vulnerability has been resolved:
afs: Fix merge preference rule failure condition
syzbot reported a lock held when returning to userspace[1]. This is
because if argc is less than 0 and the function returns directly, the held
inode lock is not released.
Fix this by store the error in ret and jump to done to clean up instead of
returning directly.
[dh: Modified Lizhi Xu's original patch to make it honour the error code
from afs_split_string()]
[1]
WARNING: lock held when returning to user space!
6.13.0-rc3-syzkaller-00209-g499551201b5f #0 Not tainted
------------------------------------------------
syz-executor133/5823 is leaving the kernel with locks still held!
1 lock held by syz-executor133/5823:
#0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: inode_lock include/linux/fs.h:818 [inline]
#0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: afs_proc_addr_prefs_write+0x2bb/0x14e0 fs/afs/addr_prefs.c:388
In the Linux kernel, the following vulnerability has been resolved:
iomap: avoid avoid truncating 64-bit offset to 32 bits
on 32-bit kernels, iomap_write_delalloc_scan() was inadvertently using a
32-bit position due to folio_next_index() returning an unsigned long.
This could lead to an infinite loop when writing to an xfs filesystem.
In the Linux kernel, the following vulnerability has been resolved:
virtio-blk: don't keep queue frozen during system suspend
Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before
deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's
PM callbacks. And the motivation is to drain inflight IOs before suspending.
block layer's queue freeze looks very handy, but it is also easy to cause
deadlock, such as, any attempt to call into bio_queue_enter() may run into
deadlock if the queue is frozen in current context. There are all kinds
of ->suspend() called in suspend context, so keeping queue frozen in the
whole suspend context isn't one good idea. And Marek reported lockdep
warning[1] caused by virtio-blk's freeze queue in virtblk_freeze().
[1] https://lore.kernel.org/linux-block/ca16370e-d646-4eee-b9cc-87277c89c43c@samsung.com/
Given the motivation is to drain in-flight IOs, it can be done by calling
freeze & unfreeze, meantime restore to previous behavior by keeping queue
quiesced during suspend.
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Add check for granularity in dml ceil/floor helpers
[Why]
Wrapper functions for dcn_bw_ceil2() and dcn_bw_floor2()
should check for granularity is non zero to avoid assert and
divide-by-zero error in dcn_bw_ functions.
[How]
Add check for granularity 0.
(cherry picked from commit f6e09701c3eb2ccb8cb0518e0b67f1c69742a4ec)
In the Linux kernel, the following vulnerability has been resolved:
btrfs: flush delalloc workers queue before stopping cleaner kthread during unmount
During the unmount path, at close_ctree(), we first stop the cleaner
kthread, using kthread_stop() which frees the associated task_struct, and
then stop and destroy all the work queues. However after we stopped the
cleaner we may still have a worker from the delalloc_workers queue running
inode.c:submit_compressed_extents(), which calls btrfs_add_delayed_iput(),
which in turn tries to wake up the cleaner kthread - which was already
destroyed before, resulting in a use-after-free on the task_struct.
Syzbot reported this with the following stack traces:
BUG: KASAN: slab-use-after-free in __lock_acquire+0x78/0x2100 kernel/locking/lockdep.c:5089
Read of size 8 at addr ffff8880259d2818 by task kworker/u8:3/52
CPU: 1 UID: 0 PID: 52 Comm: kworker/u8:3 Not tainted 6.13.0-rc1-syzkaller-00002-gcdd30ebb1b9f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: btrfs-delalloc btrfs_work_helper
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x169/0x550 mm/kasan/report.c:489
kasan_report+0x143/0x180 mm/kasan/report.c:602
__lock_acquire+0x78/0x2100 kernel/locking/lockdep.c:5089
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline]
try_to_wake_up+0xc2/0x1470 kernel/sched/core.c:4205
submit_compressed_extents+0xdf/0x16e0 fs/btrfs/inode.c:1615
run_ordered_work fs/btrfs/async-thread.c:288 [inline]
btrfs_work_helper+0x96f/0xc40 fs/btrfs/async-thread.c:324
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3310
worker_thread+0x870/0xd30 kernel/workqueue.c:3391
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Allocated by task 2:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:319 [inline]
__kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:345
kasan_slab_alloc include/linux/kasan.h:250 [inline]
slab_post_alloc_hook mm/slub.c:4104 [inline]
slab_alloc_node mm/slub.c:4153 [inline]
kmem_cache_alloc_node_noprof+0x1d9/0x380 mm/slub.c:4205
alloc_task_struct_node kernel/fork.c:180 [inline]
dup_task_struct+0x57/0x8c0 kernel/fork.c:1113
copy_process+0x5d1/0x3d50 kernel/fork.c:2225
kernel_clone+0x223/0x870 kernel/fork.c:2807
kernel_thread+0x1bc/0x240 kernel/fork.c:2869
create_kthread kernel/kthread.c:412 [inline]
kthreadd+0x60d/0x810 kernel/kthread.c:767
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Freed by task 24:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2338 [inline]
slab_free mm/slub.c:4598 [inline]
kmem_cache_free+0x195/0x410 mm/slub.c:4700
put_task_struct include/linux/sched/task.h:144 [inline]
delayed_put_task_struct+0x125/0x300 kernel/exit.c:227
rcu_do_batch kernel/rcu/tree.c:2567 [inline]
rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823
handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:554
run_ksoftirqd+0xca/0x130 kernel/softirq.c:943
---truncated---