In the Linux kernel, the following vulnerability has been resolved:
KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()
Al reported a possible use-after-free (UAF) in kvm_spapr_tce_attach_iommu_group().
It looks up `stt` from tablefd, but then continues to use it after doing
fdput() on the returned fd. After the fdput() the tablefd is free to be
closed by another thread. The close calls kvm_spapr_tce_release() and
then release_spapr_tce_table() (via call_rcu()) which frees `stt`.
Although there are calls to rcu_read_lock() in
kvm_spapr_tce_attach_iommu_group() they are not sufficient to prevent
the UAF, because `stt` is used outside the locked regions.
With an artifcial delay after the fdput() and a userspace program which
triggers the race, KASAN detects the UAF:
BUG: KASAN: slab-use-after-free in kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
Read of size 4 at addr c000200027552c30 by task kvm-vfio/2505
CPU: 54 PID: 2505 Comm: kvm-vfio Not tainted 6.10.0-rc3-next-20240612-dirty #1
Hardware name: 8335-GTH POWER9 0x4e1202 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
Call Trace:
dump_stack_lvl+0xb4/0x108 (unreliable)
print_report+0x2b4/0x6ec
kasan_report+0x118/0x2b0
__asan_load4+0xb8/0xd0
kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
kvm_vfio_set_attr+0x524/0xac0 [kvm]
kvm_device_ioctl+0x144/0x240 [kvm]
sys_ioctl+0x62c/0x1810
system_call_exception+0x190/0x440
system_call_vectored_common+0x15c/0x2ec
...
Freed by task 0:
...
kfree+0xec/0x3e0
release_spapr_tce_table+0xd4/0x11c [kvm]
rcu_core+0x568/0x16a0
handle_softirqs+0x23c/0x920
do_softirq_own_stack+0x6c/0x90
do_softirq_own_stack+0x58/0x90
__irq_exit_rcu+0x218/0x2d0
irq_exit+0x30/0x80
arch_local_irq_restore+0x128/0x230
arch_local_irq_enable+0x1c/0x30
cpuidle_enter_state+0x134/0x5cc
cpuidle_enter+0x6c/0xb0
call_cpuidle+0x7c/0x100
do_idle+0x394/0x410
cpu_startup_entry+0x60/0x70
start_secondary+0x3fc/0x410
start_secondary_prolog+0x10/0x14
Fix it by delaying the fdput() until `stt` is no longer in use, which
is effectively the entire function. To keep the patch minimal add a call
to fdput() at each of the existing return paths. Future work can convert
the function to goto or __cleanup style cleanup.
With the fix in place the test case no longer triggers the UAF.
In the Linux kernel, the following vulnerability has been resolved:
xfs: add bounds checking to xlog_recover_process_data
There is a lack of verification of the space occupied by fixed members
of xlog_op_header in the xlog_recover_process_data.
We can create a crafted image to trigger an out of bounds read by
following these steps:
1) Mount an image of xfs, and do some file operations to leave records
2) Before umounting, copy the image for subsequent steps to simulate
abnormal exit. Because umount will ensure that tail_blk and
head_blk are the same, which will result in the inability to enter
xlog_recover_process_data
3) Write a tool to parse and modify the copied image in step 2
4) Make the end of the xlog_op_header entries only 1 byte away from
xlog_rec_header->h_size
5) xlog_rec_header->h_num_logops++
6) Modify xlog_rec_header->h_crc
Fix:
Add a check to make sure there is sufficient space to access fixed members
of xlog_op_header.
In the Linux kernel, the following vulnerability has been resolved:
filelock: Remove locks reliably when fcntl/close race is detected
When fcntl_setlk() races with close(), it removes the created lock with
do_lock_file_wait().
However, LSMs can allow the first do_lock_file_wait() that created the lock
while denying the second do_lock_file_wait() that tries to remove the lock.
Separately, posix_lock_file() could also fail to
remove a lock due to GFP_KERNEL allocation failure (when splitting a range
in the middle).
After the bug has been triggered, use-after-free reads will occur in
lock_get_status() when userspace reads /proc/locks. This can likely be used
to read arbitrary kernel memory, but can't corrupt kernel memory.
Fix it by calling locks_remove_posix() instead, which is designed to
reliably get rid of POSIX locks associated with the given file and
files_struct and is also used by filp_flush().
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: bypass tiling flag check in virtual display case (v2)
vkms leverages common amdgpu framebuffer creation, and
also as it does not support FB modifier, there is no need
to check tiling flags when initing framebuffer when virtual
display is enabled.
This can fix below calltrace:
amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]
v2: check adev->enable_virtual_display instead as vkms can be
enabled in bare metal as well.
In the Linux kernel, the following vulnerability has been resolved:
net-sysfs: add check for netdevice being present to speed_show
When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.
[ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called
[ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called
...
[ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
crash> bt
...
PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd"
...
#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
[exception RIP: dma_pool_alloc+0x1ab]
RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046
RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090
RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00
R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0
R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
crash> net_device.state ffff89443b0c0000
state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
To prevent this scenario, we also make sure that the netdevice is present.
In the Linux kernel, the following vulnerability has been resolved:
drm/vc4: hdmi: Unregister codec device on unbind
On bind we will register the HDMI codec device but we don't unregister
it on unbind, leading to a device leakage. Unregister our device at
unbind.
In the Linux kernel, the following vulnerability has been resolved:
swiotlb: fix info leak with DMA_FROM_DEVICE
The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.
A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
and a corresponding dxferp. The peculiar thing about this is that TUR
is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
bounces the user-space buffer. As if the device was to transfer into
it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in
sg_build_indirect()") we make sure this first bounce buffer is
allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
device won't touch the buffer we prepare as if the we had a
DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
and the buffer allocated by SG is mapped by the function
virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
scatter-gather and not scsi generics). This mapping involves bouncing
via the swiotlb (we need swiotlb to do virtio in protected guest like
s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
(that is swiotlb) bounce buffer (which most likely contains some
previous IO data), to the first bounce buffer, which contains all
zeros. Then we copy back the content of the first bounce buffer to
the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).
Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.
In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Fix a race on command flush flow
Fix a refcount use after free warning due to a race on command entry.
Such race occurs when one of the commands releases its last refcount and
frees its index and entry while another process running command flush
flow takes refcount to this command entry. The process which handles
commands flush may see this command as needed to be flushed if the other
process released its refcount but didn't release the index yet. Fix it
by adding the needed spin lock.
It fixes the following warning trace:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 11 PID: 540311 at lib/refcount.c:25 refcount_warn_saturate+0x80/0xe0
...
RIP: 0010:refcount_warn_saturate+0x80/0xe0
...
Call Trace:
<TASK>
mlx5_cmd_trigger_completions+0x293/0x340 [mlx5_core]
mlx5_cmd_flush+0x3a/0xf0 [mlx5_core]
enter_error_state+0x44/0x80 [mlx5_core]
mlx5_fw_fatal_reporter_err_work+0x37/0xe0 [mlx5_core]
process_one_work+0x1be/0x390
worker_thread+0x4d/0x3d0
? rescuer_thread+0x350/0x350
kthread+0x141/0x160
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>