In the Linux kernel, the following vulnerability has been resolved:
bpf: Prevent bpf program recursion for raw tracepoint probes
We got report from sysbot [1] about warnings that were caused by
bpf program attached to contention_begin raw tracepoint triggering
the same tracepoint by using bpf_trace_printk helper that takes
trace_printk_lock lock.
Call Trace:
<TASK>
? trace_event_raw_event_bpf_trace_printk+0x5f/0x90
bpf_trace_printk+0x2b/0xe0
bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
bpf_trace_run2+0x26/0x90
native_queued_spin_lock_slowpath+0x1c6/0x2b0
_raw_spin_lock_irqsave+0x44/0x50
bpf_trace_printk+0x3f/0xe0
bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
bpf_trace_run2+0x26/0x90
native_queued_spin_lock_slowpath+0x1c6/0x2b0
_raw_spin_lock_irqsave+0x44/0x50
bpf_trace_printk+0x3f/0xe0
bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
bpf_trace_run2+0x26/0x90
native_queued_spin_lock_slowpath+0x1c6/0x2b0
_raw_spin_lock_irqsave+0x44/0x50
bpf_trace_printk+0x3f/0xe0
bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
bpf_trace_run2+0x26/0x90
native_queued_spin_lock_slowpath+0x1c6/0x2b0
_raw_spin_lock_irqsave+0x44/0x50
__unfreeze_partials+0x5b/0x160
...
The can be reproduced by attaching bpf program as raw tracepoint on
contention_begin tracepoint. The bpf prog calls bpf_trace_printk
helper. Then by running perf bench the spin lock code is forced to
take slow path and call contention_begin tracepoint.
Fixing this by skipping execution of the bpf program if it's
already running, Using bpf prog 'active' field, which is being
currently used by trampoline programs for the same reason.
Moving bpf_prog_inc_misses_counter to syscall.c because
trampoline.c is compiled in just for CONFIG_BPF_JIT option.
[1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t
In the Linux kernel, the following vulnerability has been resolved:
net/9p: use a dedicated spinlock for trans_fd
Shamelessly copying the explanation from Tetsuo Handa's suggested
patch[1] (slightly reworded):
syzbot is reporting inconsistent lock state in p9_req_put()[2],
for p9_tag_remove() from p9_req_put() from IRQ context is using
spin_lock_irqsave() on "struct p9_client"->lock but trans_fd
(not from IRQ context) is using spin_lock().
Since the locks actually protect different things in client.c and in
trans_fd.c, just replace trans_fd.c's lock by a new one specific to the
transport (client.c's protect the idr for fid/tag allocations,
while trans_fd.c's protects its own req list and request status field
that acts as the transport's state machine)
In the Linux kernel, the following vulnerability has been resolved:
netlink: Bounds-check struct nlmsgerr creation
In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(),
switch from __nlmsg_put to nlmsg_put(), and explain the bounds check
for dealing with the memcpy() across a composite flexible array struct.
Avoids this future run-time warning:
memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16)
In the Linux kernel, the following vulnerability has been resolved:
9p/trans_fd: always use O_NONBLOCK read/write
syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
started kernel_read() from p9_fd_read() from p9_read_work() and/or
kernel_write() from p9_fd_write() from p9_write_work() requests.
Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open()
does not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when
the file descriptor refers to a pipe. In other words, pipe file descriptor
needs to be handled as if socket file descriptor.
We somehow need to interrupt kernel_read()/kernel_write() on pipes.
A minimal change, which this patch is doing, is to set O_NONBLOCK flag
from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing
of regular files. But this approach changes O_NONBLOCK flag on userspace-
supplied file descriptors (which might break userspace programs), and
O_NONBLOCK flag could be changed by userspace. It would be possible to set
O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still
remains small race window for clearing O_NONBLOCK flag.
If we don't want to manipulate O_NONBLOCK flag, we might be able to
surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING)
and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are
processed by kernel threads which process global system_wq workqueue,
signals could not be delivered from remote threads when p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling
set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be
needed if we count on signals for making kernel_read()/kernel_write()
non-blocking.
[Dominique: add comment at Christian's suggestion]
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Check sb_bsize_shift after reading superblock
Fuzzers like to scribble over sb_bsize_shift but in reality it's very
unlikely that this field would be corrupted on its own. Nevertheless it
should be checked to avoid the possibility of messy mount errors due to
bad calculations. It's always a fixed value based on the block size so
we can just check that it's the expected value.
Tested with:
mkfs.gfs2 -O -p lock_nolock /dev/vdb
for i in 0 -1 64 65 32 33; do
gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb
mount /dev/vdb /mnt/test && umount /mnt/test
done
Before this patch we get a withdraw after
[ 76.413681] gfs2: fsid=loop0.0: fatal: invalid metadata block
[ 76.413681] bh = 19 (type: exp=5, found=4)
[ 76.413681] function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 492
and with UBSAN configured we also get complaints like
[ 76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19
[ 76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int'
After the patch, these complaints don't appear, mount fails immediately
and we get an explanation in dmesg.
In the Linux kernel, the following vulnerability has been resolved:
ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
In the Linux kernel, the following vulnerability has been resolved:
ntfs: check overflow when iterating ATTR_RECORDs
Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find().
Because the ATTR_RECORDs are next to each other, kernel can get the next
ATTR_RECORD from end address of current ATTR_RECORD, through current
ATTR_RECORD length field.
The problem is that during iteration, when kernel calculates the end
address of current ATTR_RECORD, kernel may trigger an integer overflow bug
in executing `a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))`. This
may wrap, leading to a forever iteration on 32bit systems.
This patch solves it by adding some checks on calculating end address
of current ATTR_RECORD during iteration.
In the Linux kernel, the following vulnerability has been resolved:
net: openvswitch: fix nested key length validation in the set() action
It's not safe to access nla_len(ovs_key) if the data is smaller than
the netlink header. Check that the attribute is OK first.
In the Linux kernel, the following vulnerability has been resolved:
isofs: Prevent the use of too small fid
syzbot reported a slab-out-of-bounds Read in isofs_fh_to_parent. [1]
The handle_bytes value passed in by the reproducing program is equal to 12.
In handle_to_path(), only 12 bytes of memory are allocated for the structure
file_handle->f_handle member, which causes an out-of-bounds access when
accessing the member parent_block of the structure isofs_fid in isofs,
because accessing parent_block requires at least 16 bytes of f_handle.
Here, fh_len is used to indirectly confirm that the value of handle_bytes
is greater than 3 before accessing parent_block.
[1]
BUG: KASAN: slab-out-of-bounds in isofs_fh_to_parent+0x1b8/0x210 fs/isofs/export.c:183
Read of size 4 at addr ffff0000cc030d94 by task syz-executor215/6466
CPU: 1 UID: 0 PID: 6466 Comm: syz-executor215 Not tainted 6.14.0-rc7-syzkaller-ga2392f333575 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call trace:
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0x198/0x550 mm/kasan/report.c:521
kasan_report+0xd8/0x138 mm/kasan/report.c:634
__asan_report_load4_noabort+0x20/0x2c mm/kasan/report_generic.c:380
isofs_fh_to_parent+0x1b8/0x210 fs/isofs/export.c:183
exportfs_decode_fh_raw+0x2dc/0x608 fs/exportfs/expfs.c:523
do_handle_to_path+0xa0/0x198 fs/fhandle.c:257
handle_to_path fs/fhandle.c:385 [inline]
do_handle_open+0x8cc/0xb8c fs/fhandle.c:403
__do_sys_open_by_handle_at fs/fhandle.c:443 [inline]
__se_sys_open_by_handle_at fs/fhandle.c:434 [inline]
__arm64_sys_open_by_handle_at+0x80/0x94 fs/fhandle.c:434
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x108 arch/arm64/kernel/entry-common.c:762
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Allocated by task 6466:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_alloc_info+0x40/0x50 mm/kasan/generic.c:562
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0xac/0xc4 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__do_kmalloc_node mm/slub.c:4294 [inline]
__kmalloc_noprof+0x32c/0x54c mm/slub.c:4306
kmalloc_noprof include/linux/slab.h:905 [inline]
handle_to_path fs/fhandle.c:357 [inline]
do_handle_open+0x5a4/0xb8c fs/fhandle.c:403
__do_sys_open_by_handle_at fs/fhandle.c:443 [inline]
__se_sys_open_by_handle_at fs/fhandle.c:434 [inline]
__arm64_sys_open_by_handle_at+0x80/0x94 fs/fhandle.c:434
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x108 arch/arm64/kernel/entry-common.c:762
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
In the Linux kernel, the following vulnerability has been resolved:
drm/nouveau: prime: fix ttm_bo_delayed_delete oops
Fix an oops in ttm_bo_delayed_delete which results from dererencing a
dangling pointer:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b7b: 0000 [#1] PREEMPT SMP
CPU: 4 UID: 0 PID: 1082 Comm: kworker/u65:2 Not tainted 6.14.0-rc4-00267-g505460b44513-dirty #216
Hardware name: LENOVO 82N6/LNVNB161216, BIOS GKCN65WW 01/16/2024
Workqueue: ttm ttm_bo_delayed_delete [ttm]
RIP: 0010:dma_resv_iter_first_unlocked+0x55/0x290
Code: 31 f6 48 c7 c7 00 2b fa aa e8 97 bd 52 ff e8 a2 c1 53 00 5a 85 c0 74 48 e9 88 01 00 00 4c 89 63 20 4d 85 e4 0f 84 30 01 00 00 <41> 8b 44 24 10 c6 43 2c 01 48 89 df 89 43 28 e8 97 fd ff ff 4c 8b
RSP: 0018:ffffbf9383473d60 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffbf9383473d88 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffbf9383473d78 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
R13: ffffa003bbf78580 R14: ffffa003a6728040 R15: 00000000000383cc
FS: 0000000000000000(0000) GS:ffffa00991c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000758348024dd0 CR3: 000000012c259000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
<TASK>
? __die_body.cold+0x19/0x26
? die_addr+0x3d/0x70
? exc_general_protection+0x159/0x460
? asm_exc_general_protection+0x27/0x30
? dma_resv_iter_first_unlocked+0x55/0x290
dma_resv_wait_timeout+0x56/0x100
ttm_bo_delayed_delete+0x69/0xb0 [ttm]
process_one_work+0x217/0x5c0
worker_thread+0x1c8/0x3d0
? apply_wqattrs_cleanup.part.0+0xc0/0xc0
kthread+0x10b/0x240
? kthreads_online_cpu+0x140/0x140
ret_from_fork+0x40/0x70
? kthreads_online_cpu+0x140/0x140
ret_from_fork_asm+0x11/0x20
</TASK>
The cause of this is:
- drm_prime_gem_destroy calls dma_buf_put(dma_buf) which releases the
reference to the shared dma_buf. The reference count is 0, so the
dma_buf is destroyed, which in turn decrements the corresponding
amdgpu_bo reference count to 0, and the amdgpu_bo is destroyed -
calling drm_gem_object_release then dma_resv_fini (which destroys the
reservation object), then finally freeing the amdgpu_bo.
- nouveau_bo obj->bo.base.resv is now a dangling pointer to the memory
formerly allocated to the amdgpu_bo.
- nouveau_gem_object_del calls ttm_bo_put(&nvbo->bo) which calls
ttm_bo_release, which schedules ttm_bo_delayed_delete.
- ttm_bo_delayed_delete runs and dereferences the dangling resv pointer,
resulting in a general protection fault.
Fix this by moving the drm_prime_gem_destroy call from
nouveau_gem_object_del to nouveau_bo_del_ttm. This ensures that it will
be run after ttm_bo_delayed_delete.