In the Linux kernel, the following vulnerability has been resolved:
ksmbd: Compare MACs in constant time
To prevent timing attacks, MAC comparisons need to be constant-time.
Replace the memcmp() with the correct function, crypto_memneq().
In the Linux kernel, the following vulnerability has been resolved:
net: usb: kalmia: validate USB endpoints
The kalmia driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
In the Linux kernel, the following vulnerability has been resolved:
drbd: fix "LOGIC BUG" in drbd_al_begin_io_nonblock()
Even though we check that we "should" be able to do lc_get_cumulative()
while holding the device->al_lock spinlock, it may still fail,
if some other code path decided to do lc_try_lock() with bad timing.
If that happened, we logged "LOGIC BUG for enr=...",
but still did not return an error.
The rest of the code now assumed that this request has references
for the relevant activity log extents.
The implcations are that during an active resync, mutual exclusivity of
resync versus application IO is not guaranteed. And a potential crash
at this point may not realizs that these extents could have been target
of in-flight IO and would need to be resynced just in case.
Also, once the request completes, it will give up activity log references it
does not even hold, which will trigger a BUG_ON(refcnt == 0) in lc_put().
Fix:
Do not crash the kernel for a condition that is harmless during normal
operation: also catch "e->refcnt == 0", not only "e == NULL"
when being noisy about "al_complete_io() called on inactive extent %u\n".
And do not try to be smart and "guess" whether something will work, then
be surprised when it does not.
Deal with the fact that it may or may not work. If it does not, remember a
possible "partially in activity log" state (only possible for requests that
cross extent boundaries), and return an error code from
drbd_al_begin_io_nonblock().
A latter call for the same request will then resume from where we left off.
In the Linux kernel, the following vulnerability has been resolved:
can: mcp251x: fix deadlock in error path of mcp251x_open
The mcp251x_open() function call free_irq() in its error path with the
mpc_lock mutex held. But if an interrupt already occurred the
interrupt handler will be waiting for the mpc_lock and free_irq() will
deadlock waiting for the handler to finish.
This issue is similar to the one fixed in commit 7dd9c26bd6cf ("can:
mcp251x: fix deadlock if an interrupt occurs during mcp251x_open") but
for the error path.
To solve this issue move the call to free_irq() after the lock is
released. Setting `priv->force_quit = 1` beforehand ensure that the IRQ
handler will exit right away once it acquired the lock.
In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix stack-out-of-bounds write in devmap
get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.
Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.
Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.
To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.
In the Linux kernel, the following vulnerability has been resolved:
cxl: Fix race of nvdimm_bus object when creating nvdimm objects
Found issue during running of cxl-translate.sh unit test. Adding a 3s
sleep right before the test seems to make the issue reproduce fairly
consistently. The cxl_translate module has dependency on cxl_acpi and
causes orphaned nvdimm objects to reprobe after cxl_acpi is removed.
The nvdimm_bus object is registered by the cxl_nvb object when
cxl_acpi_probe() is called. With the nvdimm_bus object missing,
__nd_device_register() will trigger NULL pointer dereference when
accessing the dev->parent that points to &nvdimm_bus->dev.
[ 192.884510] BUG: kernel NULL pointer dereference, address: 000000000000006c
[ 192.895383] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20250812-19.fc42 08/12/2025
[ 192.897721] Workqueue: cxl_port cxl_bus_rescan_queue [cxl_core]
[ 192.899459] RIP: 0010:kobject_get+0xc/0x90
[ 192.924871] Call Trace:
[ 192.925959] <TASK>
[ 192.926976] ? pm_runtime_init+0xb9/0xe0
[ 192.929712] __nd_device_register.part.0+0x4d/0xc0 [libnvdimm]
[ 192.933314] __nvdimm_create+0x206/0x290 [libnvdimm]
[ 192.936662] cxl_nvdimm_probe+0x119/0x1d0 [cxl_pmem]
[ 192.940245] cxl_bus_probe+0x1a/0x60 [cxl_core]
[ 192.943349] really_probe+0xde/0x380
This patch also relies on the previous change where
devm_cxl_add_nvdimm_bridge() is called from drivers/cxl/pmem.c instead
of drivers/cxl/core.c to ensure the dependency of cxl_acpi on cxl_pmem.
1. Set probe_type of cxl_nvb to PROBE_FORCE_SYNCHRONOUS to ensure the
driver is probed synchronously when add_device() is called.
2. Add a check in __devm_cxl_add_nvdimm_bridge() to ensure that the
cxl_nvb driver is attached during cxl_acpi_probe().
3. Take the cxl_root uport_dev lock and the cxl_nvb->dev lock in
devm_cxl_add_nvdimm() before checking nvdimm_bus is valid.
4. Set cxl_nvdimm flag to CXL_NVD_F_INVALIDATED so cxl_nvdimm_probe()
will exit with -EBUSY.
The removal of cxl_nvdimm devices should prevent any orphaned devices
from probing once the nvdimm_bus is gone.
[ dj: Fixed 0-day reported kdoc issue. ]
[ dj: Fix cxl_nvb reference leak on error. Gregory (kreview-0811365) ]
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nft_set_pipapo: split gc into unlink and reclaim phase
Yiming Qian reports Use-after-free in the pipapo set type:
Under a large number of expired elements, commit-time GC can run for a very
long time in a non-preemptible context, triggering soft lockup warnings and
RCU stall reports (local denial of service).
We must split GC in an unlink and a reclaim phase.
We cannot queue elements for freeing until pointers have been swapped.
Expired elements are still exposed to both the packet path and userspace
dumpers via the live copy of the data structure.
call_rcu() does not protect us: dump operations or element lookups starting
after call_rcu has fired can still observe the free'd element, unless the
commit phase has made enough progress to swap the clone and live pointers
before any new reader has picked up the old version.
This a similar approach as done recently for the rbtree backend in commit
35f83a75529a ("netfilter: nft_set_rbtree: don't gc elements on insert").
In the Linux kernel, the following vulnerability has been resolved:
x86/efi: defer freeing of boot services memory
efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
and EFI_BOOT_SERVICES_DATA using memblock_free_late().
There are two issue with that: memblock_free_late() should be used for
memory allocated with memblock_alloc() while the memory reserved with
memblock_reserve() should be freed with free_reserved_area().
More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
efi_free_boot_services() is called before deferred initialization of the
memory map is complete.
Benjamin Herrenschmidt reports that this causes a leak of ~140MB of
RAM on EC2 t3a.nano instances which only have 512MB or RAM.
If the freed memory resides in the areas that memory map for them is
still uninitialized, they won't be actually freed because
memblock_free_late() calls memblock_free_pages() and the latter skips
uninitialized pages.
Using free_reserved_area() at this point is also problematic because
__free_page() accesses the buddy of the freed page and that again might
end up in uninitialized part of the memory map.
Delaying the entire efi_free_boot_services() could be problematic
because in addition to freeing boot services memory it updates
efi.memmap without any synchronization and that's undesirable late in
boot when there is concurrency.
More robust approach is to only defer freeing of the EFI boot services
memory.
Split efi_free_boot_services() in two. First efi_unmap_boot_services()
collects ranges that should be freed into an array then
efi_free_boot_services() later frees them after deferred init is complete.
In the Linux kernel, the following vulnerability has been resolved:
xdp: produce a warning when calculated tailroom is negative
Many ethernet drivers report xdp Rx queue frag size as being the same as
DMA write size. However, the only user of this field, namely
bpf_xdp_frags_increase_tail(), clearly expects a truesize.
Such difference leads to unspecific memory corruption issues under certain
circumstances, e.g. in ixgbevf maximum DMA write size is 3 KB, so when
running xskxceiver's XDP_ADJUST_TAIL_GROW_MULTI_BUFF, 6K packet fully uses
all DMA-writable space in 2 buffers. This would be fine, if only
rxq->frag_size was properly set to 4K, but value of 3K results in a
negative tailroom, because there is a non-zero page offset.
We are supposed to return -EINVAL and be done with it in such case, but due
to tailroom being stored as an unsigned int, it is reported to be somewhere
near UINT_MAX, resulting in a tail being grown, even if the requested
offset is too much (it is around 2K in the abovementioned test). This later
leads to all kinds of unspecific calltraces.
[ 7340.337579] xskxceiver[1440]: segfault at 1da718 ip 00007f4161aeac9d sp 00007f41615a6a00 error 6
[ 7340.338040] xskxceiver[1441]: segfault at 7f410000000b ip 00000000004042b5 sp 00007f415bffecf0 error 4
[ 7340.338179] in libc.so.6[61c9d,7f4161aaf000+160000]
[ 7340.339230] in xskxceiver[42b5,400000+69000]
[ 7340.340300] likely on CPU 6 (core 0, socket 6)
[ 7340.340302] Code: ff ff 01 e9 f4 fe ff ff 0f 1f 44 00 00 4c 39 f0 74 73 31 c0 ba 01 00 00 00 f0 0f b1 17 0f 85 ba 00 00 00 49 8b 87 88 00 00 00 <4c> 89 70 08 eb cc 0f 1f 44 00 00 48 8d bd f0 fe ff ff 89 85 ec fe
[ 7340.340888] likely on CPU 3 (core 0, socket 3)
[ 7340.345088] Code: 00 00 00 ba 00 00 00 00 be 00 00 00 00 89 c7 e8 31 ca ff ff 89 45 ec 8b 45 ec 85 c0 78 07 b8 00 00 00 00 eb 46 e8 0b c8 ff ff <8b> 00 83 f8 69 74 24 e8 ff c7 ff ff 8b 00 83 f8 0b 74 18 e8 f3 c7
[ 7340.404334] Oops: general protection fault, probably for non-canonical address 0x6d255010bdffc: 0000 [#1] SMP NOPTI
[ 7340.405972] CPU: 7 UID: 0 PID: 1439 Comm: xskxceiver Not tainted 6.19.0-rc1+ #21 PREEMPT(lazy)
[ 7340.408006] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014
[ 7340.409716] RIP: 0010:lookup_swap_cgroup_id+0x44/0x80
[ 7340.410455] Code: 83 f8 1c 73 39 48 ba ff ff ff ff ff ff ff 03 48 8b 04 c5 20 55 fa bd 48 21 d1 48 89 ca 83 e1 01 48 d1 ea c1 e1 04 48 8d 04 90 <8b> 00 48 83 c4 10 d3 e8 c3 cc cc cc cc 31 c0 e9 98 b7 dd 00 48 89
[ 7340.412787] RSP: 0018:ffffcc5c04f7f6d0 EFLAGS: 00010202
[ 7340.413494] RAX: 0006d255010bdffc RBX: ffff891f477895a8 RCX: 0000000000000010
[ 7340.414431] RDX: 0001c17e3fffffff RSI: 00fa070000000000 RDI: 000382fc7fffffff
[ 7340.415354] RBP: 00fa070000000000 R08: ffffcc5c04f7f8f8 R09: ffffcc5c04f7f7d0
[ 7340.416283] R10: ffff891f4c1a7000 R11: ffffcc5c04f7f9c8 R12: ffffcc5c04f7f7d0
[ 7340.417218] R13: 03ffffffffffffff R14: 00fa06fffffffe00 R15: ffff891f47789500
[ 7340.418229] FS: 0000000000000000(0000) GS:ffff891ffdfaa000(0000) knlGS:0000000000000000
[ 7340.419489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7340.420286] CR2: 00007f415bfffd58 CR3: 0000000103f03002 CR4: 0000000000772ef0
[ 7340.421237] PKRU: 55555554
[ 7340.421623] Call Trace:
[ 7340.421987] <TASK>
[ 7340.422309] ? softleaf_from_pte+0x77/0xa0
[ 7340.422855] swap_pte_batch+0xa7/0x290
[ 7340.423363] zap_nonpresent_ptes.constprop.0.isra.0+0xd1/0x270
[ 7340.424102] zap_pte_range+0x281/0x580
[ 7340.424607] zap_pmd_range.isra.0+0xc9/0x240
[ 7340.425177] unmap_page_range+0x24d/0x420
[ 7340.425714] unmap_vmas+0xa1/0x180
[ 7340.426185] exit_mmap+0xe1/0x3b0
[ 7340.426644] __mmput+0x41/0x150
[ 7340.427098] exit_mm+0xb1/0x110
[ 7340.427539] do_exit+0x1b2/0x460
[ 7340.427992] do_group_exit+0x2d/0xc0
[ 7340.428477] get_signal+0x79d/0x7e0
[ 7340.428957] arch_do_signal_or_restart+0x34/0x100
[ 7340.429571] exit_to_user_mode_loop+0x8e/0x4c0
[ 7340.430159] do_syscall_64+0x188/
---truncated---
In the Linux kernel, the following vulnerability has been resolved:
arm64: io: Extract user memory type in ioremap_prot()
The only caller of ioremap_prot() outside of the generic ioremap()
implementation is generic_access_phys(), which passes a 'pgprot_t' value
determined from the user mapping of the target 'pfn' being accessed by
the kernel. On arm64, the 'pgprot_t' contains all of the non-address
bits from the pte, including the permission controls, and so we end up
returning a new user mapping from ioremap_prot() which faults when
accessed from the kernel on systems with PAN:
| Unable to handle kernel read from unreadable memory at virtual address ffff80008ea89000
| ...
| Call trace:
| __memcpy_fromio+0x80/0xf8
| generic_access_phys+0x20c/0x2b8
| __access_remote_vm+0x46c/0x5b8
| access_remote_vm+0x18/0x30
| environ_read+0x238/0x3e8
| vfs_read+0xe4/0x2b0
| ksys_read+0xcc/0x178
| __arm64_sys_read+0x4c/0x68
Extract only the memory type from the user 'pgprot_t' in ioremap_prot()
and assert that we're being passed a user mapping, to protect us against
any changes in future that may require additional handling. To avoid
falsely flagging users of ioremap(), provide our own ioremap() macro
which simply wraps __ioremap_prot().