Security Vulnerabilities
- CVEs Published In February 2025
In the Linux kernel, the following vulnerability has been resolved:
blk-iolatency: Fix inflight count imbalances and IO hangs on offline
iolatency needs to track the number of inflight IOs per cgroup. As this
tracking can be expensive, it is disabled when no cgroup has iolatency
configured for the device. To ensure that the inflight counters stay
balanced, iolatency_set_limit() freezes the request_queue while manipulating
the enabled counter, which ensures that no IO is in flight and thus all
counters are zero.
Unfortunately, iolatency_set_limit() isn't the only place where the enabled
counter is manipulated. iolatency_pd_offline() can also dec the counter and
trigger disabling. As this disabling happens without freezing the q, this
can easily happen while some IOs are in flight and thus leak the counts.
This can be easily demonstrated by turning on iolatency on an one empty
cgroup while IOs are in flight in other cgroups and then removing the
cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
system to ensure that removing the cgroup disables iolatency for the whole
device.
The following keeps flipping on and off iolatency on sda:
echo +io > /sys/fs/cgroup/cgroup.subtree_control
while true; do
mkdir -p /sys/fs/cgroup/test
echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency
sleep 1
rmdir /sys/fs/cgroup/test
sleep 1
done
and there's concurrent fio generating direct rand reads:
fio --name test --filename=/dev/sda --direct=1 --rw=randread \
--runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k
while monitoring with the following drgn script:
while True:
for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
if pd.value_() == 0:
continue
iolat = container_of(pd, 'struct iolatency_grp', 'pd')
inflight = iolat.rq_wait.inflight.counter.value_()
if inflight:
print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
f'{cgroup_path(css.cgroup).decode("utf-8")}')
time.sleep(1)
The monitoring output looks like the following:
inflight=1 sda /user.slice
inflight=1 sda /user.slice
...
inflight=14 sda /user.slice
inflight=13 sda /user.slice
inflight=17 sda /user.slice
inflight=15 sda /user.slice
inflight=18 sda /user.slice
inflight=17 sda /user.slice
inflight=20 sda /user.slice
inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19
inflight=19 sda /user.slice
inflight=19 sda /user.slice
If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
will never get issued as there's no completion event to wake it up leading
to an indefinite hang.
This patch fixes the bug by unifying enable handling into a work item which
is automatically kicked off from iolatency_set_min_lat_nsec() which is
called from both iolatency_set_limit() and iolatency_pd_offline() paths.
Punting to a work item is necessary as iolatency_pd_offline() is called
under spinlocks while freezing a request_queue requires a sleepable context.
This also simplifies the code reducing LOC sans the comments and avoids the
unnecessary freezes which were happening whenever a cgroup's latency target
is newly set or cleared.
In the Linux kernel, the following vulnerability has been resolved:
tipc: check attribute length for bearer name
syzbot reported uninit-value:
=====================================================
BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:644 [inline]
BUG: KMSAN: uninit-value in string+0x4f9/0x6f0 lib/vsprintf.c:725
string_nocheck lib/vsprintf.c:644 [inline]
string+0x4f9/0x6f0 lib/vsprintf.c:725
vsnprintf+0x2222/0x3650 lib/vsprintf.c:2806
vprintk_store+0x537/0x2150 kernel/printk/printk.c:2158
vprintk_emit+0x28b/0xab0 kernel/printk/printk.c:2256
vprintk_default+0x86/0xa0 kernel/printk/printk.c:2283
vprintk+0x15f/0x180 kernel/printk/printk_safe.c:50
_printk+0x18d/0x1cf kernel/printk/printk.c:2293
tipc_enable_bearer net/tipc/bearer.c:371 [inline]
__tipc_nl_bearer_enable+0x2022/0x22a0 net/tipc/bearer.c:1033
tipc_nl_bearer_enable+0x6c/0xb0 net/tipc/bearer.c:1042
genl_family_rcv_msg_doit net/netlink/genetlink.c:731 [inline]
- Do sanity check the attribute length for TIPC_NLA_BEARER_NAME.
- Do not use 'illegal name' in printing message.
In the Linux kernel, the following vulnerability has been resolved:
rtc: mt6397: check return value after calling platform_get_resource()
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
In the Linux kernel, the following vulnerability has been resolved:
scsi: sd: Fix potential NULL pointer dereference
If sd_probe() sees an early error before sdkp->device is initialized,
sd_zbc_release_disk() is called. This causes a NULL pointer dereference
when sd_is_zoned() is called inside that function. Avoid this by removing
the call to sd_zbc_release_disk() in sd_probe() error path.
This change is safe and does not result in zone information memory leakage
because the zone information for a zoned disk is allocated only when
sd_revalidate_disk() is called, at which point sdkp->disk_dev is fully set,
resulting in sd_disk_release() being called when needed to cleanup a disk
zone information using sd_zbc_release_disk().
In the Linux kernel, the following vulnerability has been resolved:
blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx
blk_mq_run_hw_queues() could be run when there isn't queued request and
after queue is cleaned up, at that time tagset is freed, because tagset
lifetime is covered by driver, and often freed after blk_cleanup_queue()
returns.
So don't touch ->tagset for figuring out current default hctx by the mapping
built in request queue, so use-after-free on tagset can be avoided. Meantime
this way should be fast than retrieving mapping from tagset.
In the Linux kernel, the following vulnerability has been resolved:
sfc: fix considering that all channels have TX queues
Normally, all channels have RX and TX queues, but this is not true if
modparam efx_separate_tx_channels=1 is used. In that cases, some
channels only have RX queues and others only TX queues (or more
preciselly, they have them allocated, but not initialized).
Fix efx_channel_has_tx_queues to return the correct value for this case
too.
Messages shown at probe time before the fix:
sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0
------------[ cut here ]------------
netdevice: ens6f0np0: failed to initialise TXQ -1
WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc]
[...] stripped
RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc]
[...] stripped
Call Trace:
efx_init_tx_queue+0xaa/0xf0 [sfc]
efx_start_channels+0x49/0x120 [sfc]
efx_start_all+0x1f8/0x430 [sfc]
efx_net_open+0x5a/0xe0 [sfc]
__dev_open+0xd0/0x190
__dev_change_flags+0x1b3/0x220
dev_change_flags+0x21/0x60
[...] stripped
Messages shown at remove time before the fix:
sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues
sfc 0000:03:00.0 ens6f0np0: failed to flush queues
In the Linux kernel, the following vulnerability has been resolved:
driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction
Mounting NFS rootfs was timing out when deferred_probe_timeout was
non-zero [1]. This was because ip_auto_config() initcall times out
waiting for the network interfaces to show up when
deferred_probe_timeout was non-zero. While ip_auto_config() calls
wait_for_device_probe() to make sure any currently running deferred
probe work or asynchronous probe finishes, that wasn't sufficient to
account for devices being deferred until deferred_probe_timeout.
Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
until the deferred_probe_timeout fires") tried to fix that by making
sure wait_for_device_probe() waits for deferred_probe_timeout to expire
before returning.
However, if wait_for_device_probe() is called from the kernel_init()
context:
- Before deferred_probe_initcall() [2], it causes the boot process to
hang due to a deadlock.
- After deferred_probe_initcall() [3], it blocks kernel_init() from
continuing till deferred_probe_timeout expires and beats the point of
deferred_probe_timeout that's trying to wait for userspace to load
modules.
Neither of this is good. So revert the changes to
wait_for_device_probe().
[1] - https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01MB4544.jpnprd01.prod.outlook.com/
[2] - https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
[3] - https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to avoid f2fs_bug_on() in dec_valid_node_count()
As Yanming reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=215897
I have encountered a bug in F2FS file system in kernel v5.17.
The kernel should enable CONFIG_KASAN=y and CONFIG_KASAN_INLINE=y. You can
reproduce the bug by running the following commands:
The kernel message is shown below:
kernel BUG at fs/f2fs/f2fs.h:2511!
Call Trace:
f2fs_remove_inode_page+0x2a2/0x830
f2fs_evict_inode+0x9b7/0x1510
evict+0x282/0x4e0
do_unlinkat+0x33a/0x540
__x64_sys_unlinkat+0x8e/0xd0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The root cause is: .total_valid_block_count or .total_valid_node_count
could fuzzed to zero, then once dec_valid_node_count() was called, it
will cause BUG_ON(), this patch fixes to print warning info and set
SBI_NEED_FSCK into CP instead of panic.
In the Linux kernel, the following vulnerability has been resolved:
jffs2: fix memory leak in jffs2_do_fill_super
If jffs2_iget() or d_make_root() in jffs2_do_fill_super() returns
an error, we can observe the following kmemleak report:
--------------------------------------------
unreferenced object 0xffff888105a65340 (size 64):
comm "mount", pid 710, jiffies 4302851558 (age 58.239s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff859c45e5>] kmem_cache_alloc_trace+0x475/0x8a0
[<ffffffff86160146>] jffs2_sum_init+0x96/0x1a0
[<ffffffff86140e25>] jffs2_do_mount_fs+0x745/0x2120
[<ffffffff86149fec>] jffs2_do_fill_super+0x35c/0x810
[<ffffffff8614aae9>] jffs2_fill_super+0x2b9/0x3b0
[...]
unreferenced object 0xffff8881bd7f0000 (size 65536):
comm "mount", pid 710, jiffies 4302851558 (age 58.239s)
hex dump (first 32 bytes):
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
backtrace:
[<ffffffff858579ba>] kmalloc_order+0xda/0x110
[<ffffffff85857a11>] kmalloc_order_trace+0x21/0x130
[<ffffffff859c2ed1>] __kmalloc+0x711/0x8a0
[<ffffffff86160189>] jffs2_sum_init+0xd9/0x1a0
[<ffffffff86140e25>] jffs2_do_mount_fs+0x745/0x2120
[<ffffffff86149fec>] jffs2_do_fill_super+0x35c/0x810
[<ffffffff8614aae9>] jffs2_fill_super+0x2b9/0x3b0
[...]
--------------------------------------------
This is because the resources allocated in jffs2_sum_init() are not
released. Call jffs2_sum_exit() to release these resources to solve
the problem.
In the Linux kernel, the following vulnerability has been resolved:
soc: rockchip: Fix refcount leak in rockchip_grf_init
of_find_matching_node_and_match returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.