In the Linux kernel, the following vulnerability has been resolved:
ceph: fix race condition validating r_parent before applying state
Add validation to ensure the cached parent directory inode matches the
directory info in MDS replies. This prevents client-side race conditions
where concurrent operations (e.g. rename) cause r_parent to become stale
between request initiation and reply processing, which could lead to
applying state changes to incorrect directory inodes.
[ idryomov: folded a kerneldoc fixup and a follow-up fix from Alex to
move CEPH_CAP_PIN reference when r_parent is updated:
When the parent directory lock is not held, req->r_parent can become
stale and is updated to point to the correct inode. However, the
associated CEPH_CAP_PIN reference was not being adjusted. The
CEPH_CAP_PIN is a reference on an inode that is tracked for
accounting purposes. Moving this pin is important to keep the
accounting balanced. When the pin was not moved from the old parent
to the new one, it created two problems: The reference on the old,
stale parent was never released, causing a reference leak.
A reference for the new parent was never acquired, creating the risk
of a reference underflow later in ceph_mdsc_release_request(). This
patch corrects the logic by releasing the pin from the old parent and
acquiring it for the new parent when r_parent is switched. This
ensures reference accounting stays balanced. ]
In the Linux kernel, the following vulnerability has been resolved:
can: j1939: implement NETDEV_UNREGISTER notification handler
syzbot is reporting
unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
problem, for j1939 protocol did not have NETDEV_UNREGISTER notification
handler for undoing changes made by j1939_sk_bind().
Commit 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct
callback") expects that a call to j1939_priv_put() can be unconditionally
delayed until j1939_sk_sock_destruct() is called. But we need to call
j1939_priv_put() against an extra ref held by j1939_sk_bind() call
(as a part of undoing changes made by j1939_sk_bind()) as soon as
NETDEV_UNREGISTER notification fires (i.e. before j1939_sk_sock_destruct()
is called via j1939_sk_release()). Otherwise, the extra ref on "struct
j1939_priv" held by j1939_sk_bind() call prevents "struct net_device" from
dropping the usage count to 1; making it impossible for
unregister_netdevice() to continue.
[mkl: remove space in front of label]
In the Linux kernel, the following vulnerability has been resolved:
net: phylink: add lock for serializing concurrent pl->phydev writes with resolver
Currently phylink_resolve() protects itself against concurrent
phylink_bringup_phy() or phylink_disconnect_phy() calls which modify
pl->phydev by relying on pl->state_mutex.
The problem is that in phylink_resolve(), pl->state_mutex is in a lock
inversion state with pl->phydev->lock. So pl->phydev->lock needs to be
acquired prior to pl->state_mutex. But that requires dereferencing
pl->phydev in the first place, and without pl->state_mutex, that is
racy.
Hence the reason for the extra lock. Currently it is redundant, but it
will serve a functional purpose once mutex_lock(&phy->lock) will be
moved outside of the mutex_lock(&pl->state_mutex) section.
Another alternative considered would have been to let phylink_resolve()
acquire the rtnl_mutex, which is also held when phylink_bringup_phy()
and phylink_disconnect_phy() are called. But since phylink_disconnect_phy()
runs under rtnl_lock(), it would deadlock with phylink_resolve() when
calling flush_work(&pl->resolve). Additionally, it would have been
undesirable because it would have unnecessarily blocked many other call
paths as well in the entire kernel, so the smaller-scoped lock was
preferred.
In the Linux kernel, the following vulnerability has been resolved:
mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()
kasan_populate_vmalloc() and its helpers ignore the caller's gfp_mask and
always allocate memory using the hardcoded GFP_KERNEL flag. This makes
them inconsistent with vmalloc(), which was recently extended to support
GFP_NOFS and GFP_NOIO allocations.
Page table allocations performed during shadow population also ignore the
external gfp_mask. To preserve the intended semantics of GFP_NOFS and
GFP_NOIO, wrap the apply_to_page_range() calls into the appropriate
memalloc scope.
xfs calls vmalloc with GFP_NOFS, so this bug could lead to deadlock.
There was a report here
https://lkml.kernel.org/r/686ea951.050a0220.385921.0016.GAE@google.com
This patch:
- Extends kasan_populate_vmalloc() and helpers to take gfp_mask;
- Passes gfp_mask down to alloc_pages_bulk() and __get_free_page();
- Enforces GFP_NOFS/NOIO semantics with memalloc_*_save()/restore()
around apply_to_page_range();
- Updates vmalloc.c and percpu allocator call sites accordingly.
In the Linux kernel, the following vulnerability has been resolved:
i40e: remove read access to debugfs files
The 'command' and 'netdev_ops' debugfs files are a legacy debugging
interface supported by the i40e driver since its early days by commit
02e9c290814c ("i40e: debugfs interface").
Both of these debugfs files provide a read handler which is mostly useless,
and which is implemented with questionable logic. They both use a static
256 byte buffer which is initialized to the empty string. In the case of
the 'command' file this buffer is literally never used and simply wastes
space. In the case of the 'netdev_ops' file, the last command written is
saved here.
On read, the files contents are presented as the name of the device
followed by a colon and then the contents of their respective static
buffer. For 'command' this will always be "<device>: ". For 'netdev_ops',
this will be "<device>: <last command written>". But note the buffer is
shared between all devices operated by this module. At best, it is mostly
meaningless information, and at worse it could be accessed simultaneously
as there doesn't appear to be any locking mechanism.
We have also recently received multiple reports for both read functions
about their use of snprintf and potential overflow that could result in
reading arbitrary kernel memory. For the 'command' file, this is definitely
impossible, since the static buffer is always zero and never written to.
For the 'netdev_ops' file, it does appear to be possible, if the user
carefully crafts the command input, it will be copied into the buffer,
which could be large enough to cause snprintf to truncate, which then
causes the copy_to_user to read beyond the length of the buffer allocated
by kzalloc.
A minimal fix would be to replace snprintf() with scnprintf() which would
cap the return to the number of bytes written, preventing an overflow. A
more involved fix would be to drop the mostly useless static buffers,
saving 512 bytes and modifying the read functions to stop needing those as
input.
Instead, lets just completely drop the read access to these files. These
are debug interfaces exposed as part of debugfs, and I don't believe that
dropping read access will break any script, as the provided output is
pretty useless. You can find the netdev name through other more standard
interfaces, and the 'netdev_ops' interface can easily result in garbage if
you issue simultaneous writes to multiple devices at once.
In order to properly remove the i40e_dbg_netdev_ops_buf, we need to
refactor its write function to avoid using the static buffer. Instead, use
the same logic as the i40e_dbg_command_write, with an allocated buffer.
Update the code to use this instead of the static buffer, and ensure we
free the buffer on exit. This fixes simultaneous writes to 'netdev_ops' on
multiple devices, and allows us to remove the now unused static buffer
along with removing the read access.
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: hci_core: Disable works on hci_unregister_dev
This make use of disable_work_* on hci_unregister_dev since the hci_dev is
about to be freed new submissions are not disarable.
In the Linux kernel, the following vulnerability has been resolved:
bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:
...
[10.011566] do_raw_spin_lock.cold
[10.011570] try_to_wake_up (5) double-acquiring the same
[10.011575] kick_pool rq_lock, causing a hardlockup
[10.011579] __queue_work
[10.011582] queue_work_on
[10.011585] kernfs_notify
[10.011589] cgroup_file_notify
[10.011593] try_charge_memcg (4) memcg accounting raises an
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event
[10.011599] obj_cgroup_charge_account
[10.011600] __memcg_slab_post_alloc_hook
[10.011603] __kmalloc_node_noprof
...
[10.011611] bpf_map_kmalloc_node
[10.011612] __bpf_async_init
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
[10.011619] bpf__sched_ext_ops_runnable
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
[10.011622] enqueue_task
[10.011626] ttwu_do_activate
[10.011629] sched_ttwu_pending (1) grabs rq_lock
...
The above was reproduced on bpf-next (b338cf849ec8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.
Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/