In the Linux kernel, the following vulnerability has been resolved:
eventpoll: Fix semi-unbounded recursion
Ensure that epoll instances can never form a graph deeper than
EP_MAX_NESTS+1 links.
Currently, ep_loop_check_proc() ensures that the graph is loop-free and
does some recursion depth checks, but those recursion depth checks don't
limit the depth of the resulting tree for two reasons:
- They don't look upwards in the tree.
- If there are multiple downwards paths of different lengths, only one of
the paths is actually considered for the depth check since commit
28d82dc1c4ed ("epoll: limit paths").
Essentially, the current recursion depth check in ep_loop_check_proc() just
serves to prevent it from recursing too deeply while checking for loops.
A more thorough check is done in reverse_path_check() after the new graph
edge has already been created; this checks, among other things, that no
paths going upwards from any non-epoll file with a length of more than 5
edges exist. However, this check does not apply to non-epoll files.
As a result, it is possible to recurse to a depth of at least roughly 500,
tested on v6.15. (I am unsure if deeper recursion is possible; and this may
have changed with commit 8c44dac8add7 ("eventpoll: Fix priority inversion
problem").)
To fix it:
1. In ep_loop_check_proc(), note the subtree depth of each visited node,
and use subtree depths for the total depth calculation even when a subtree
has already been visited.
2. Add ep_get_upwards_depth_proc() for similarly determining the maximum
depth of an upwards walk.
3. In ep_loop_check(), use these values to limit the total path length
between epoll nodes to EP_MAX_NESTS edges.
In the Linux kernel, the following vulnerability has been resolved:
xen: fix UAF in dmabuf_exp_from_pages()
[dma_buf_fd() fixes; no preferences regarding the tree it goes through -
up to xen folks]
As soon as we'd inserted a file reference into descriptor table, another
thread could close it. That's fine for the case when all we are doing is
returning that descriptor to userland (it's a race, but it's a userland
race and there's nothing the kernel can do about it). However, if we
follow fd_install() with any kind of access to objects that would be
destroyed on close (be it the struct file itself or anything destroyed
by its ->release()), we have a UAF.
dma_buf_fd() is a combination of reserving a descriptor and fd_install().
gntdev dmabuf_exp_from_pages() calls it and then proceeds to access the
objects destroyed on close - starting with gntdev_dmabuf itself.
Fix that by doing reserving descriptor before anything else and do
fd_install() only when everything had been set up.
In the Linux kernel, the following vulnerability has been resolved:
bpf: Reject narrower access to pointer ctx fields
The following BPF program, simplified from a syzkaller repro, causes a
kernel warning:
r0 = *(u8 *)(r1 + 169);
exit;
With pointer field sk being at offset 168 in __sk_buff. This access is
detected as a narrower read in bpf_skb_is_valid_access because it
doesn't match offsetof(struct __sk_buff, sk). It is therefore allowed
and later proceeds to bpf_convert_ctx_access. Note that for the
"is_narrower_load" case in the convert_ctx_accesses(), the insn->off
is aligned, so the cnt may not be 0 because it matches the
offsetof(struct __sk_buff, sk) in the bpf_convert_ctx_access. However,
the target_size stays 0 and the verifier errors with a kernel warning:
verifier bug: error during ctx access conversion(1)
This patch fixes that to return a proper "invalid bpf_context access
off=X size=Y" error on the load instruction.
The same issue affects multiple other fields in context structures that
allow narrow access. Some other non-affected fields (for sk_msg,
sk_lookup, and sockopt) were also changed to use bpf_ctx_range_ptr for
consistency.
Note this syzkaller crash was reported in the "Closes" link below, which
used to be about a different bug, fixed in
commit fce7bd8e385a ("bpf/verifier: Handle BPF_LOAD_ACQ instructions
in insn_def_regno()"). Because syzbot somehow confused the two bugs,
the new crash and repro didn't get reported to the mailing list.
In the Linux kernel, the following vulnerability has been resolved:
padata: Fix pd UAF once and for all
There is a race condition/UAF in padata_reorder that goes back
to the initial commit. A reference count is taken at the start
of the process in padata_do_parallel, and released at the end in
padata_serial_worker.
This reference count is (and only is) required for padata_replace
to function correctly. If padata_replace is never called then
there is no issue.
In the function padata_reorder which serves as the core of padata,
as soon as padata is added to queue->serial.list, and the associated
spin lock released, that padata may be processed and the reference
count on pd would go away.
Fix this by getting the next padata before the squeue->serial lock
is released.
In order to make this possible, simplify padata_reorder by only
calling it once the next padata arrives.
In the Linux kernel, the following vulnerability has been resolved:
powerpc/eeh: Make EEH driver device hotplug safe
Multiple race conditions existed between the PCIe hotplug driver and the
EEH driver, leading to a variety of kernel oopses of the same general
nature:
<pcie device unplug>
<eeh driver trigger>
<hotplug removal trigger>
<pcie tree reconfiguration>
<eeh recovery next step>
<oops in EEH driver bus iteration loop>
A second class of oops is also seen when the underlying bus disappears
during device recovery.
Refactor the EEH module to be PCI rescan and remove safe. Also clean
up a few minor formatting / readability issues.
In the Linux kernel, the following vulnerability has been resolved:
HID: core: Harden s32ton() against conversion to 0 bits
Testing by the syzbot fuzzer showed that the HID core gets a
shift-out-of-bounds exception when it tries to convert a 32-bit
quantity to a 0-bit quantity. Ideally this should never occur, but
there are buggy devices and some might have a report field with size
set to zero; we shouldn't reject the report or the device just because
of that.
Instead, harden the s32ton() routine so that it returns a reasonable
result instead of crashing when it is called with the number of bits
set to 0 -- the same as what snto32() does.
In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix bug due to prealloc collision
When userspace is using AF_RXRPC to provide a server, it has to preallocate
incoming calls and assign to them call IDs that will be used to thread
related recvmsg() and sendmsg() together. The preallocated call IDs will
automatically be attached to calls as they come in until the pool is empty.
To the kernel, the call IDs are just arbitrary numbers, but userspace can
use the call ID to hold a pointer to prepared structs. In any case, the
user isn't permitted to create two calls with the same call ID (call IDs
become available again when the call ends) and EBADSLT should result from
sendmsg() if an attempt is made to preallocate a call with an in-use call
ID.
However, the cleanup in the error handling will trigger both assertions in
rxrpc_cleanup_call() because the call isn't marked complete and isn't
marked as having been released.
Fix this by setting the call state in rxrpc_service_prealloc_one() and then
marking it as being released before calling the cleanup function.
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix use-after-free in cifs_oplock_break
A race condition can occur in cifs_oplock_break() leading to a
use-after-free of the cinode structure when unmounting:
cifs_oplock_break()
_cifsFileInfo_put(cfile)
cifsFileInfo_put_final()
cifs_sb_deactive()
[last ref, start releasing sb]
kill_sb()
kill_anon_super()
generic_shutdown_super()
evict_inodes()
dispose_list()
evict()
destroy_inode()
call_rcu(&inode->i_rcu, i_callback)
spin_lock(&cinode->open_file_lock) <- OK
[later] i_callback()
cifs_free_inode()
kmem_cache_free(cinode)
spin_unlock(&cinode->open_file_lock) <- UAF
cifs_done_oplock_break(cinode) <- UAF
The issue occurs when umount has already released its reference to the
superblock. When _cifsFileInfo_put() calls cifs_sb_deactive(), this
releases the last reference, triggering the immediate cleanup of all
inodes under RCU. However, cifs_oplock_break() continues to access the
cinode after this point, resulting in use-after-free.
Fix this by holding an extra reference to the superblock during the
entire oplock break operation. This ensures that the superblock and
its inodes remain valid until the oplock break completes.
In the Linux kernel, the following vulnerability has been resolved:
iio: common: st_sensors: Fix use of uninitialize device structs
Throughout the various probe functions &indio_dev->dev is used before it
is initialized. This caused a kernel panic in st_sensors_power_enable()
when the call to devm_regulator_bulk_get_enable() fails and then calls
dev_err_probe() with the uninitialized device.
This seems to only cause a panic with dev_err_probe(), dev_err(),
dev_warn() and dev_info() don't seem to cause a panic, but are fixed
as well.
The issue is reported and traced here: [1]
In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix recv-recv race of completed call
If a call receives an event (such as incoming data), the call gets placed
on the socket's queue and a thread in recvmsg can be awakened to go and
process it. Once the thread has picked up the call off of the queue,
further events will cause it to be requeued, and once the socket lock is
dropped (recvmsg uses call->user_mutex to allow the socket to be used in
parallel), a second thread can come in and its recvmsg can pop the call off
the socket queue again.
In such a case, the first thread will be receiving stuff from the call and
the second thread will be blocked on call->user_mutex. The first thread
can, at this point, process both the event that it picked call for and the
event that the second thread picked the call for and may see the call
terminate - in which case the call will be "released", decoupling the call
from the user call ID assigned to it (RXRPC_USER_CALL_ID in the control
message).
The first thread will return okay, but then the second thread will wake up
holding the user_mutex and, if it sees that the call has been released by
the first thread, it will BUG thusly:
kernel BUG at net/rxrpc/recvmsg.c:474!
Fix this by just dequeuing the call and ignoring it if it is seen to be
already released. We can't tell userspace about it anyway as the user call
ID has become stale.