In the Linux kernel, the following vulnerability has been resolved:
wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
This is the candidate patch of CVE-2023-47233 :
https://nvd.nist.gov/vuln/detail/CVE-2023-47233
In brcm80211 driver,it starts with the following invoking chain
to start init a timeout worker:
->brcmf_usb_probe
->brcmf_usb_probe_cb
->brcmf_attach
->brcmf_bus_started
->brcmf_cfg80211_attach
->wl_init_priv
->brcmf_init_escan
->INIT_WORK(&cfg->escan_timeout_work,
brcmf_cfg80211_escan_timeout_worker);
If we disconnect the USB by hotplug, it will call
brcmf_usb_disconnect to make cleanup. The invoking chain is :
brcmf_usb_disconnect
->brcmf_usb_disconnect_cb
->brcmf_detach
->brcmf_cfg80211_detach
->kfree(cfg);
While the timeout woker may still be running. This will cause
a use-after-free bug on cfg in brcmf_cfg80211_escan_timeout_worker.
Fix it by deleting the timer and canceling the worker in
brcmf_cfg80211_detach.
[arend.vanspriel@broadcom.com: keep timer delete as is and cancel work just before free]
In the Linux kernel, the following vulnerability has been resolved:
ext4: fix corruption during on-line resize
We observed a corruption during on-line resize of a file system that is
larger than 16 TiB with 4k block size. With having more then 2^32 blocks
resize_inode is turned off by default by mke2fs. The issue can be
reproduced on a smaller file system for convenience by explicitly
turning off resize_inode. An on-line resize across an 8 GiB boundary (the
size of a meta block group in this setup) then leads to a corruption:
dev=/dev/<some_dev> # should be >= 16 GiB
mkdir -p /corruption
/sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15))
mount -t ext4 $dev /corruption
dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15))
sha1sum /corruption/test
# 79d2658b39dcfd77274e435b0934028adafaab11 /corruption/test
/sbin/resize2fs $dev $((2*2**21))
# drop page cache to force reload the block from disk
echo 1 > /proc/sys/vm/drop_caches
sha1sum /corruption/test
# 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3 /corruption/test
2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per
block group and 2^6 are the number of block groups that make a meta
block group.
The last checksum might be different depending on how the file is laid
out across the physical blocks. The actual corruption occurs at physical
block 63*2^15 = 2064384 which would be the location of the backup of the
meta block group's block descriptor. During the on-line resize the file
system will be converted to meta_bg starting at s_first_meta_bg which is
2 in the example - meaning all block groups after 16 GiB. However, in
ext4_flex_group_add we might add block groups that are not part of the
first meta block group yet. In the reproducer we achieved this by
substracting the size of a whole block group from the point where the
meta block group would start. This must be considered when updating the
backup block group descriptors to follow the non-meta_bg layout. The fix
is to add a test whether the group to add is already part of the meta
block group or not.
In the Linux kernel, the following vulnerability has been resolved:
md/dm-raid: don't call md_reap_sync_thread() directly
Currently md_reap_sync_thread() is called from raid_message() directly
without holding 'reconfig_mutex', this is definitely unsafe because
md_reap_sync_thread() can change many fields that is protected by
'reconfig_mutex'.
However, hold 'reconfig_mutex' here is still problematic because this
will cause deadlock, for example, commit 130443d60b1b ("md: refactor
idle/frozen_sync_thread() to fix deadlock").
Fix this problem by using stop_sync_thread() to unregister sync_thread,
like md/raid did.
In the Linux kernel, the following vulnerability has been resolved:
PCI/PM: Drain runtime-idle callbacks before driver removal
A race condition between the .runtime_idle() callback and the .remove()
callback in the rtsx_pcr PCI driver leads to a kernel crash due to an
unhandled page fault [1].
The problem is that rtsx_pci_runtime_idle() is not expected to be running
after pm_runtime_get_sync() has been called, but the latter doesn't really
guarantee that. It only guarantees that the suspend and resume callbacks
will not be running when it returns.
However, if a .runtime_idle() callback is already running when
pm_runtime_get_sync() is called, the latter will notice that the runtime PM
status of the device is RPM_ACTIVE and it will return right away without
waiting for the former to complete. In fact, it cannot wait for
.runtime_idle() to complete because it may be called from that callback (it
arguably does not make much sense to do that, but it is not strictly
prohibited).
Thus in general, whoever is providing a .runtime_idle() callback needs
to protect it from running in parallel with whatever code runs after
pm_runtime_get_sync(). [Note that .runtime_idle() will not start after
pm_runtime_get_sync() has returned, but it may continue running then if it
has started earlier.]
One way to address that race condition is to call pm_runtime_barrier()
after pm_runtime_get_sync() (not before it, because a nonzero value of the
runtime PM usage counter is necessary to prevent runtime PM callbacks from
being invoked) to wait for the .runtime_idle() callback to complete should
it be running at that point. A suitable place for doing that is in
pci_device_remove() which calls pm_runtime_get_sync() before removing the
driver, so it may as well call pm_runtime_barrier() subsequently, which
will prevent the race in question from occurring, not just in the rtsx_pcr
driver, but in any PCI drivers providing .runtime_idle() callbacks.
In the Linux kernel, the following vulnerability has been resolved:
x86/efistub: Call mixed mode boot services on the firmware's stack
Normally, the EFI stub calls into the EFI boot services using the stack
that was live when the stub was entered. According to the UEFI spec,
this stack needs to be at least 128k in size - this might seem large but
all asynchronous processing and event handling in EFI runs from the same
stack and so quite a lot of space may be used in practice.
In mixed mode, the situation is a bit different: the bootloader calls
the 32-bit EFI stub entry point, which calls the decompressor's 32-bit
entry point, where the boot stack is set up, using a fixed allocation
of 16k. This stack is still in use when the EFI stub is started in
64-bit mode, and so all calls back into the EFI firmware will be using
the decompressor's limited boot stack.
Due to the placement of the boot stack right after the boot heap, any
stack overruns have gone unnoticed. However, commit
5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code")
moved the definition of the boot heap into C code, and now the boot
stack is placed right at the base of BSS, where any overruns will
corrupt the end of the .data section.
While it would be possible to work around this by increasing the size of
the boot stack, doing so would affect all x86 systems, and mixed mode
systems are a tiny (and shrinking) fraction of the x86 installed base.
So instead, record the firmware stack pointer value when entering from
the 32-bit firmware, and switch to this stack every time a EFI boot
service call is made.
In the Linux kernel, the following vulnerability has been resolved:
dm snapshot: fix lockup in dm_exception_table_exit
There was reported lockup when we exit a snapshot with many exceptions.
Fix this by adding "cond_resched" to the loop that frees the exceptions.
In the Linux kernel, the following vulnerability has been resolved:
crypto: s390/aes - Fix buffer overread in CTR mode
When processing the last block, the s390 ctr code will always read
a whole block, even if there isn't a whole block of data left. Fix
this by using the actual length left and copy it into a buffer first
for processing.
In the Linux kernel, the following vulnerability has been resolved:
net: atlantic: eliminate double free in error handling logic
Driver has a logic leak in ring data allocation/free,
where aq_ring_free could be called multiple times on same ring,
if system is under stress and got memory allocation error.
Ring pointer was used as an indicator of failure, but this is
not correct since only ring data is allocated/deallocated.
Ring itself is an array member.
Changing ring allocation functions to return error code directly.
This simplifies error handling and eliminates aq_ring_free
on higher layer.
In the Linux kernel, the following vulnerability has been resolved:
ALSA: usb-audio: Stop parsing channels bits when all channels are found.
If a usb audio device sets more bits than the amount of channels
it could write outside of the map array.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix deadlock with fiemap and extent locking
While working on the patchset to remove extent locking I got a lockdep
splat with fiemap and pagefaulting with my new extent lock replacement
lock.
This deadlock exists with our normal code, we just don't have lockdep
annotations with the extent locking so we've never noticed it.
Since we're copying the fiemap extent to user space on every iteration
we have the chance of pagefaulting. Because we hold the extent lock for
the entire range we could mkwrite into a range in the file that we have
mmap'ed. This would deadlock with the following stack trace
[<0>] lock_extent+0x28d/0x2f0
[<0>] btrfs_page_mkwrite+0x273/0x8a0
[<0>] do_page_mkwrite+0x50/0xb0
[<0>] do_fault+0xc1/0x7b0
[<0>] __handle_mm_fault+0x2fa/0x460
[<0>] handle_mm_fault+0xa4/0x330
[<0>] do_user_addr_fault+0x1f4/0x800
[<0>] exc_page_fault+0x7c/0x1e0
[<0>] asm_exc_page_fault+0x26/0x30
[<0>] rep_movs_alternative+0x33/0x70
[<0>] _copy_to_user+0x49/0x70
[<0>] fiemap_fill_next_extent+0xc8/0x120
[<0>] emit_fiemap_extent+0x4d/0xa0
[<0>] extent_fiemap+0x7f8/0xad0
[<0>] btrfs_fiemap+0x49/0x80
[<0>] __x64_sys_ioctl+0x3e1/0xb50
[<0>] do_syscall_64+0x94/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
I wrote an fstest to reproduce this deadlock without my replacement lock
and verified that the deadlock exists with our existing locking.
To fix this simply don't take the extent lock for the entire duration of
the fiemap. This is safe in general because we keep track of where we
are when we're searching the tree, so if an ordered extent updates in
the middle of our fiemap call we'll still emit the correct extents
because we know what offset we were on before.
The only place we maintain the lock is searching delalloc. Since the
delalloc stuff can change during writeback we want to lock the extent
range so we have a consistent view of delalloc at the time we're
checking to see if we need to set the delalloc flag.
With this patch applied we no longer deadlock with my testcase.