In the Linux kernel, the following vulnerability has been resolved:
cxl/port: Fix delete_endpoint() vs parent unregistration race
The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of
ports (struct cxl_port objects) between an endpoint and the root of a
CXL topology. Each port including the endpoint port is attached to the
cxl_port driver.
Given that setup, it follows that when either any port in that lineage
goes through a cxl_port ->remove() event, or the memdev goes through a
cxl_mem ->remove() event. The hierarchy below the removed port, or the
entire hierarchy if the memdev is removed needs to come down.
The delete_endpoint() callback is careful to check whether it is being
called to tear down the hierarchy, or if it is only being called to
teardown the memdev because an ancestor port is going through
->remove().
That care needs to take the device_lock() of the endpoint's parent.
Which requires 2 bugs to be fixed:
1/ A reference on the parent is needed to prevent use-after-free
scenarios like this signature:
BUG: spinlock bad magic on CPU#0, kworker/u56:0/11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
Workqueue: cxl_port detach_memdev [cxl_core]
RIP: 0010:spin_bug+0x65/0xa0
Call Trace:
do_raw_spin_lock+0x69/0xa0
__mutex_lock+0x695/0xb80
delete_endpoint+0xad/0x150 [cxl_core]
devres_release_all+0xb8/0x110
device_unbind_cleanup+0xe/0x70
device_release_driver_internal+0x1d2/0x210
detach_memdev+0x15/0x20 [cxl_core]
process_one_work+0x1e3/0x4c0
worker_thread+0x1dd/0x3d0
2/ In the case of RCH topologies, the parent device that needs to be
locked is not always @port->dev as returned by cxl_mem_find_port(), use
endpoint->dev.parent instead.
In the Linux kernel, the following vulnerability has been resolved:
riscv: VMAP_STACK overflow detection thread-safe
commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added
support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to
`shadow_stack` temporarily before switching finally to per-cpu
`overflow_stack`.
If two CPUs/harts are racing and end up in over flowing kernel stack, one
or both will end up corrupting each other state because `shadow_stack` is
not per-cpu. This patch optimizes per-cpu overflow stack switch by
directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`.
Following are the changes in this patch
- Defines an asm macro to obtain per-cpu symbols in destination
register.
- In entry.S, when overflow is detected, per-cpu overflow stack is
located using per-cpu asm macro. Computing per-cpu symbol requires
a temporary register. x31 is saved away into CSR_SCRATCH
(CSR_SCRATCH is anyways zero since we're in kernel).
Please see Links for additional relevant disccussion and alternative
solution.
Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT`
Kernel crash log below
Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT
Task stack: [0xff20000010a98000..0xff20000010a9c000]
Overflow stack: [0xff600001f7d98370..0xff600001f7d99370]
CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34
Hardware name: riscv-virtio,qemu (DT)
epc : __memset+0x60/0xfc
ra : recursive_loop+0x48/0xc6 [lkdtm]
epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80
gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88
t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0
s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000
a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000
a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff
s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90
s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684
s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10
s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4
t5 : ffffffff815dbab8 t6 : ff20000010a9bb48
status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f
Kernel panic - not syncing: Kernel stack overflow
CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80006754>] dump_backtrace+0x30/0x38
[<ffffffff808de798>] show_stack+0x40/0x4c
[<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c
[<ffffffff808ea2d8>] dump_stack+0x18/0x20
[<ffffffff808dec06>] panic+0x126/0x2fe
[<ffffffff800065ea>] walk_stackframe+0x0/0xf0
[<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm]
SMP: stopping secondary CPUs
---[ end Kernel panic - not syncing: Kernel stack overflow ]---
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix use-after-free in smb2_query_info_compound()
The following UAF was triggered when running fstests generic/072 with
KASAN enabled against Windows Server 2022 and mount options
'multichannel,max_channels=2,vers=3.1.1,mfsymlinks,noperm'
BUG: KASAN: slab-use-after-free in smb2_query_info_compound+0x423/0x6d0 [cifs]
Read of size 8 at addr ffff888014941048 by task xfs_io/27534
CPU: 0 PID: 27534 Comm: xfs_io Not tainted 6.6.0-rc7 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
Call Trace:
dump_stack_lvl+0x4a/0x80
print_report+0xcf/0x650
? srso_alias_return_thunk+0x5/0x7f
? srso_alias_return_thunk+0x5/0x7f
? __phys_addr+0x46/0x90
kasan_report+0xda/0x110
? smb2_query_info_compound+0x423/0x6d0 [cifs]
? smb2_query_info_compound+0x423/0x6d0 [cifs]
smb2_query_info_compound+0x423/0x6d0 [cifs]
? __pfx_smb2_query_info_compound+0x10/0x10 [cifs]
? srso_alias_return_thunk+0x5/0x7f
? __stack_depot_save+0x39/0x480
? kasan_save_stack+0x33/0x60
? kasan_set_track+0x25/0x30
? ____kasan_slab_free+0x126/0x170
smb2_queryfs+0xc2/0x2c0 [cifs]
? __pfx_smb2_queryfs+0x10/0x10 [cifs]
? __pfx___lock_acquire+0x10/0x10
smb311_queryfs+0x210/0x220 [cifs]
? __pfx_smb311_queryfs+0x10/0x10 [cifs]
? srso_alias_return_thunk+0x5/0x7f
? __lock_acquire+0x480/0x26c0
? lock_release+0x1ed/0x640
? srso_alias_return_thunk+0x5/0x7f
? do_raw_spin_unlock+0x9b/0x100
cifs_statfs+0x18c/0x4b0 [cifs]
statfs_by_dentry+0x9b/0xf0
fd_statfs+0x4e/0xb0
__do_sys_fstatfs+0x7f/0xe0
? __pfx___do_sys_fstatfs+0x10/0x10
? srso_alias_return_thunk+0x5/0x7f
? lockdep_hardirqs_on_prepare+0x136/0x200
? srso_alias_return_thunk+0x5/0x7f
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Allocated by task 27534:
kasan_save_stack+0x33/0x60
kasan_set_track+0x25/0x30
__kasan_kmalloc+0x8f/0xa0
open_cached_dir+0x71b/0x1240 [cifs]
smb2_query_info_compound+0x5c3/0x6d0 [cifs]
smb2_queryfs+0xc2/0x2c0 [cifs]
smb311_queryfs+0x210/0x220 [cifs]
cifs_statfs+0x18c/0x4b0 [cifs]
statfs_by_dentry+0x9b/0xf0
fd_statfs+0x4e/0xb0
__do_sys_fstatfs+0x7f/0xe0
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Freed by task 27534:
kasan_save_stack+0x33/0x60
kasan_set_track+0x25/0x30
kasan_save_free_info+0x2b/0x50
____kasan_slab_free+0x126/0x170
slab_free_freelist_hook+0xd0/0x1e0
__kmem_cache_free+0x9d/0x1b0
open_cached_dir+0xff5/0x1240 [cifs]
smb2_query_info_compound+0x5c3/0x6d0 [cifs]
smb2_queryfs+0xc2/0x2c0 [cifs]
This is a race between open_cached_dir() and cached_dir_lease_break()
where the cache entry for the open directory handle receives a lease
break while creating it. And before returning from open_cached_dir(),
we put the last reference of the new @cfid because of
!@cfid->has_lease.
Besides the UAF, while running xfstests a lot of missed lease breaks
have been noticed in tests that run several concurrent statfs(2) calls
on those cached fids
CIFS: VFS: \\w22-root1.gandalf.test No task to wake, unknown frame...
CIFS: VFS: \\w22-root1.gandalf.test Cmd: 18 Err: 0x0 Flags: 0x1...
CIFS: VFS: \\w22-root1.gandalf.test smb buf 00000000715bfe83 len 108
CIFS: VFS: Dump pending requests:
CIFS: VFS: \\w22-root1.gandalf.test No task to wake, unknown frame...
CIFS: VFS: \\w22-root1.gandalf.test Cmd: 18 Err: 0x0 Flags: 0x1...
CIFS: VFS: \\w22-root1.gandalf.test smb buf 000000005aa7316e len 108
...
To fix both, in open_cached_dir() ensure that @cfid->has_lease is set
right before sending out compounded request so that any potential
lease break will be get processed by demultiplex thread while we're
still caching @cfid. And, if open failed for some reason, re-check
@cfid->has_lease to decide whether or not put lease reference.
In the Linux kernel, the following vulnerability has been resolved:
block: fix q->blkg_list corruption during disk rebind
Multiple gendisk instances can allocated/added for single request queue
in case of disk rebind. blkg may still stay in q->blkg_list when calling
blkcg_init_disk() for rebind, then q->blkg_list becomes corrupted.
Fix the list corruption issue by:
- add blkg_init_queue() to initialize q->blkg_list & q->blkcg_mutex only
- move calling blkg_init_queue() into blk_alloc_queue()
The list corruption should be started since commit f1c006f1c685 ("blk-cgroup:
synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
which delays removing blkg from q->blkg_list into blkg_free_workfn().
In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Register devlink first under devlink lock
In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.
[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
In the Linux kernel, the following vulnerability has been resolved:
btrfs: make sure that WRITTEN is set on all metadata blocks
We previously would call btrfs_check_leaf() if we had the check
integrity code enabled, which meant that we could only run the extended
leaf checks if we had WRITTEN set on the header flags.
This leaves a gap in our checking, because we could end up with
corruption on disk where WRITTEN isn't set on the leaf, and then the
extended leaf checks don't get run which we rely on to validate all of
the item pointers to make sure we don't access memory outside of the
extent buffer.
However, since 732fab95abe2 ("btrfs: check-integrity: remove
CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call
btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only
ever call it on blocks that are being written out, and thus have WRITTEN
set, or that are being read in, which should have WRITTEN set.
Add checks to make sure we have WRITTEN set appropriately, and then make
sure __btrfs_check_leaf() always does the item checking. This will
protect us from file systems that have been corrupted and no longer have
WRITTEN set on some of the blocks.
This was hit on a crafted image tweaking the WRITTEN bit and reported by
KASAN as out-of-bound access in the eb accessors. The example is a dir
item at the end of an eb.
[2.042] BTRFS warning (device loop1): bad eb member start: ptr 0x3fff start 30572544 member offset 16410 size 2
[2.040] general protection fault, probably for non-canonical address 0xe0009d1000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI
[2.537] KASAN: maybe wild-memory-access in range [0x0005088000000018-0x000508800000001f]
[2.729] CPU: 0 PID: 2587 Comm: mount Not tainted 6.8.2 #1
[2.729] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[2.621] RIP: 0010:btrfs_get_16+0x34b/0x6d0
[2.621] RSP: 0018:ffff88810871fab8 EFLAGS: 00000206
[2.621] RAX: 0000a11000000003 RBX: ffff888104ff8720 RCX: ffff88811b2288c0
[2.621] RDX: dffffc0000000000 RSI: ffffffff81dd8aca RDI: ffff88810871f748
[2.621] RBP: 000000000000401a R08: 0000000000000001 R09: ffffed10210e3ee9
[2.621] R10: ffff88810871f74f R11: 205d323430333737 R12: 000000000000001a
[2.621] R13: 000508800000001a R14: 1ffff110210e3f5d R15: ffffffff850011e8
[2.621] FS: 00007f56ea275840(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
[2.621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2.621] CR2: 00007febd13b75c0 CR3: 000000010bb50000 CR4: 00000000000006f0
[2.621] Call Trace:
[2.621] <TASK>
[2.621] ? show_regs+0x74/0x80
[2.621] ? die_addr+0x46/0xc0
[2.621] ? exc_general_protection+0x161/0x2a0
[2.621] ? asm_exc_general_protection+0x26/0x30
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? btrfs_get_16+0x34b/0x6d0
[2.621] ? btrfs_get_16+0x33a/0x6d0
[2.621] ? __pfx_btrfs_get_16+0x10/0x10
[2.621] ? __pfx_mutex_unlock+0x10/0x10
[2.621] btrfs_match_dir_item_name+0x101/0x1a0
[2.621] btrfs_lookup_dir_item+0x1f3/0x280
[2.621] ? __pfx_btrfs_lookup_dir_item+0x10/0x10
[2.621] btrfs_get_tree+0xd25/0x1910
[ copy more details from report ]
In the Linux kernel, the following vulnerability has been resolved:
drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()
Subject: [PATCH] drm/panfrost: Fix the error path in
panfrost_mmu_map_fault_addr()
If some the pages or sgt allocation failed, we shouldn't release the
pages ref we got earlier, otherwise we will end up with unbalanced
get/put_pages() calls. We should instead leave everything in place
and let the BO release function deal with extra cleanup when the object
is destroyed, or let the fault handler try again next time it's called.
In the Linux kernel, the following vulnerability has been resolved:
net: phy: phy_device: Prevent nullptr exceptions on ISR
If phydev->irq is set unconditionally, check
for valid interrupt handler or fall back to polling mode to prevent
nullptr exceptions in interrupt service routine.
In the Linux kernel, the following vulnerability has been resolved:
wifi: rtw89: fix null pointer access when abort scan
During cancel scan we might use vif that weren't scanning.
Fix this by using the actual scanning vif.