In the Linux kernel, the following vulnerability has been resolved:
net: macb: fix unregister_netdev call order in macb_remove()
When removing a macb device, the driver calls phy_exit() before
unregister_netdev(). This leads to a WARN from kernfs:
------------[ cut here ]------------
kernfs: can not remove 'attached_dev', no directory
WARNING: CPU: 1 PID: 27146 at fs/kernfs/dir.c:1683
Call trace:
kernfs_remove_by_name_ns+0xd8/0xf0
sysfs_remove_link+0x24/0x58
phy_detach+0x5c/0x168
phy_disconnect+0x4c/0x70
phylink_disconnect_phy+0x6c/0xc0 [phylink]
macb_close+0x6c/0x170 [macb]
...
macb_remove+0x60/0x168 [macb]
platform_remove+0x5c/0x80
...
The warning happens because the PHY is being exited while the netdev
is still registered. The correct order is to unregister the netdev
before shutting down the PHY and cleaning up the MDIO bus.
Fix this by moving unregister_netdev() ahead of phy_exit() in
macb_remove().
In the Linux kernel, the following vulnerability has been resolved:
xfrm: Duplicate SPI Handling
The issue originates when Strongswan initiates an XFRM_MSG_ALLOCSPI
Netlink message, which triggers the kernel function xfrm_alloc_spi().
This function is expected to ensure uniqueness of the Security Parameter
Index (SPI) for inbound Security Associations (SAs). However, it can
return success even when the requested SPI is already in use, leading
to duplicate SPIs assigned to multiple inbound SAs, differentiated
only by their destination addresses.
This behavior causes inconsistencies during SPI lookups for inbound packets.
Since the lookup may return an arbitrary SA among those with the same SPI,
packet processing can fail, resulting in packet drops.
According to RFC 4301 section 4.4.2 , for inbound processing a unicast SA
is uniquely identified by the SPI and optionally protocol.
Reproducing the Issue Reliably:
To consistently reproduce the problem, restrict the available SPI range in
charon.conf : spi_min = 0x10000000 spi_max = 0x10000002
This limits the system to only 2 usable SPI values.
Next, create more than 2 Child SA. each using unique pair of src/dst address.
As soon as the 3rd Child SA is initiated, it will be assigned a duplicate
SPI, since the SPI pool is already exhausted.
With a narrow SPI range, the issue is consistently reproducible.
With a broader/default range, it becomes rare and unpredictable.
Current implementation:
xfrm_spi_hash() lookup function computes hash using daddr, proto, and family.
So if two SAs have the same SPI but different destination addresses, then
they will:
a. Hash into different buckets
b. Be stored in different linked lists (byspi + h)
c. Not be seen in the same hlist_for_each_entry_rcu() iteration.
As a result, the lookup will result in NULL and kernel allows that Duplicate SPI
Proposed Change:
xfrm_state_lookup_spi_proto() does a truly global search - across all states,
regardless of hash bucket and matches SPI and proto.
In the Linux kernel, the following vulnerability has been resolved:
crypto: x86/aegis - Add missing error checks
The skcipher_walk functions can allocate memory and can fail, so
checking for errors is necessary.
In the Linux kernel, the following vulnerability has been resolved:
parisc: Drop WARN_ON_ONCE() from flush_cache_vmap
I have observed warning to occassionally trigger.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: subpage: keep TOWRITE tag until folio is cleaned
btrfs_subpage_set_writeback() calls folio_start_writeback() the first time
a folio is written back, and it also clears the PAGECACHE_TAG_TOWRITE tag
even if there are still dirty blocks in the folio. This can break ordering
guarantees, such as those required by btrfs_wait_ordered_extents().
That ordering breakage leads to a real failure. For example, running
generic/464 on a zoned setup will hit the following ASSERT. This happens
because the broken ordering fails to flush existing dirty pages before the
file size is truncated.
assertion failed: !list_empty(&ordered->list) :: 0, in fs/btrfs/zoned.c:1899
------------[ cut here ]------------
kernel BUG at fs/btrfs/zoned.c:1899!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 1906169 Comm: kworker/u130:2 Kdump: loaded Not tainted 6.16.0-rc6-BTRFS-ZNS+ #554 PREEMPT(voluntary)
Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
RIP: 0010:btrfs_finish_ordered_zoned.cold+0x50/0x52 [btrfs]
RSP: 0018:ffffc9002efdbd60 EFLAGS: 00010246
RAX: 000000000000004c RBX: ffff88811923c4e0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff827e38b1 RDI: 00000000ffffffff
RBP: ffff88810005d000 R08: 00000000ffffdfff R09: ffffffff831051c8
R10: ffffffff83055220 R11: 0000000000000000 R12: ffff8881c2458c00
R13: ffff88811923c540 R14: ffff88811923c5e8 R15: ffff8881c1bd9680
FS: 0000000000000000(0000) GS:ffff88a04acd0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f907c7a918c CR3: 0000000004024000 CR4: 0000000000350ef0
Call Trace:
<TASK>
? srso_return_thunk+0x5/0x5f
btrfs_finish_ordered_io+0x4a/0x60 [btrfs]
btrfs_work_helper+0xf9/0x490 [btrfs]
process_one_work+0x204/0x590
? srso_return_thunk+0x5/0x5f
worker_thread+0x1d6/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x118/0x230
? __pfx_kthread+0x10/0x10
ret_from_fork+0x205/0x260
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Consider process A calling writepages() with WB_SYNC_NONE. In zoned mode or
for compressed writes, it locks several folios for delalloc and starts
writing them out. Let's call the last locked folio folio X. Suppose the
write range only partially covers folio X, leaving some pages dirty.
Process A calls btrfs_subpage_set_writeback() when building a bio. This
function call clears the TOWRITE tag of folio X, whose size = 8K and
the block size = 4K. It is following state.
0 4K 8K
|/////|/////| (flag: DIRTY, tag: DIRTY)
<-----> Process A will write this range.
Now suppose process B concurrently calls writepages() with WB_SYNC_ALL. It
calls tag_pages_for_writeback() to tag dirty folios with
PAGECACHE_TAG_TOWRITE. Since folio X is still dirty, it gets tagged. Then,
B collects tagged folios using filemap_get_folios_tag() and must wait for
folio X to be written before returning from writepages().
0 4K 8K
|/////|/////| (flag: DIRTY, tag: DIRTY|TOWRITE)
However, between tagging and collecting, process A may call
btrfs_subpage_set_writeback() and clear folio X's TOWRITE tag.
0 4K 8K
| |/////| (flag: DIRTY|WRITEBACK, tag: DIRTY)
As a result, process B won't see folio X in its batch, and returns without
waiting for it. This breaks the WB_SYNC_ALL ordering requirement.
Fix this by using btrfs_subpage_set_writeback_keepwrite(), which retains
the TOWRITE tag. We now manually clear the tag only after the folio becomes
clean, via the xas operation.
In the Linux kernel, the following vulnerability has been resolved:
LoongArch: Optimize module load time by optimizing PLT/GOT counting
When enabling CONFIG_KASAN, CONFIG_PREEMPT_VOLUNTARY_BUILD and
CONFIG_PREEMPT_VOLUNTARY at the same time, there will be soft deadlock,
the relevant logs are as follows:
rcu: INFO: rcu_sched self-detected stall on CPU
...
Call Trace:
[<900000000024f9e4>] show_stack+0x5c/0x180
[<90000000002482f4>] dump_stack_lvl+0x94/0xbc
[<9000000000224544>] rcu_dump_cpu_stacks+0x1fc/0x280
[<900000000037ac80>] rcu_sched_clock_irq+0x720/0xf88
[<9000000000396c34>] update_process_times+0xb4/0x150
[<90000000003b2474>] tick_nohz_handler+0xf4/0x250
[<9000000000397e28>] __hrtimer_run_queues+0x1d0/0x428
[<9000000000399b2c>] hrtimer_interrupt+0x214/0x538
[<9000000000253634>] constant_timer_interrupt+0x64/0x80
[<9000000000349938>] __handle_irq_event_percpu+0x78/0x1a0
[<9000000000349a78>] handle_irq_event_percpu+0x18/0x88
[<9000000000354c00>] handle_percpu_irq+0x90/0xf0
[<9000000000348c74>] handle_irq_desc+0x94/0xb8
[<9000000001012b28>] handle_cpu_irq+0x68/0xa0
[<9000000001def8c0>] handle_loongarch_irq+0x30/0x48
[<9000000001def958>] do_vint+0x80/0xd0
[<9000000000268a0c>] kasan_mem_to_shadow.part.0+0x2c/0x2a0
[<90000000006344f4>] __asan_load8+0x4c/0x120
[<900000000025c0d0>] module_frob_arch_sections+0x5c8/0x6b8
[<90000000003895f0>] load_module+0x9e0/0x2958
[<900000000038b770>] __do_sys_init_module+0x208/0x2d0
[<9000000001df0c34>] do_syscall+0x94/0x190
[<900000000024d6fc>] handle_syscall+0xbc/0x158
After analysis, this is because the slow speed of loading the amdgpu
module leads to the long time occupation of the cpu and then the soft
deadlock.
When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs/GOTs that will be needed to handle all the RELAs. It
will call the count_max_entries() to find in an out-of-order date which
counting algorithm has O(n^2) complexity.
To make it faster, we sort the relocation list by info and addend. That
way, to check for a duplicate relocation, it just needs to compare with
the previous entry. This reduces the complexity of the algorithm to O(n
log n), as done in commit d4e0340919fb ("arm64/module: Optimize module
load time by optimizing PLT counting"). This gives sinificant reduction
in module load time for modules with large number of relocations.
After applying this patch, the soft deadlock problem has been solved,
and the kernel starts normally without "Call Trace".
Using the default configuration to test some modules, the results are as
follows:
Module Size
ip_tables 36K
fat 143K
radeon 2.5MB
amdgpu 16MB
Without this patch:
Module Module load time (ms) Count(PLTs/GOTs)
ip_tables 18 59/6
fat 0 162/14
radeon 54 1221/84
amdgpu 1411 4525/1098
With this patch:
Module Module load time (ms) Count(PLTs/GOTs)
ip_tables 18 59/6
fat 0 162/14
radeon 22 1221/84
amdgpu 45 4525/1098
In the Linux kernel, the following vulnerability has been resolved:
wifi: ath12k: Decrement TID on RX peer frag setup error handling
Currently, TID is not decremented before peer cleanup, during error
handling path of ath12k_dp_rx_peer_frag_setup(). This could lead to
out-of-bounds access in peer->rx_tid[].
Hence, add a decrement operation for TID, before peer cleanup to
ensures proper cleanup and prevents out-of-bounds access issues when
the RX peer frag setup fails.
Found during code review. Compile tested only.
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: add null check
[WHY]
Prevents null pointer dereferences to enhance function robustness
[HOW]
Adds early null check and return false if invalid.
In the Linux kernel, the following vulnerability has been resolved:
ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered
If a synchronous error is detected as a result of user-space process
triggering a 2-bit uncorrected error, the CPU will take a synchronous
error exception such as Synchronous External Abort (SEA) on Arm64. The
kernel will queue a memory_failure() work which poisons the related
page, unmaps the page, and then sends a SIGBUS to the process, so that
a system wide panic can be avoided.
However, no memory_failure() work will be queued when abnormal
synchronous errors occur. These errors can include situations like
invalid PA, unexpected severity, no memory failure config support,
invalid GUID section, etc. In such a case, the user-space process will
trigger SEA again. This loop can potentially exceed the platform
firmware threshold or even trigger a kernel hard lockup, leading to a
system reboot.
Fix it by performing a force kill if no memory_failure() work is queued
for synchronous errors.
[ rjw: Changelog edits ]
In the Linux kernel, the following vulnerability has been resolved:
netfilter: ctnetlink: remove refcounting in expectation dumpers
Same pattern as previous patch: do not keep the expectation object
alive via refcount, only store a cookie value and then use that
as the skip hint for dump resumption.
AFAICS this has the same issue as the one resolved in the conntrack
dumper, when we do
if (!refcount_inc_not_zero(&exp->use))
to increment the refcount, there is a chance that exp == last, which
causes a double-increment of the refcount and subsequent memory leak.