In the Linux kernel, the following vulnerability has been resolved:
nfc: rawsock: cancel tx_work before socket teardown
In rawsock_release(), cancel any pending tx_work and purge the write
queue before orphaning the socket. rawsock_tx_work runs on the system
workqueue and calls nfc_data_exchange which dereferences the NCI
device. Without synchronization, tx_work can race with socket and
device teardown when a process is killed (e.g. by SIGKILL), leading
to use-after-free or leaked references.
Set SEND_SHUTDOWN first so that if tx_work is already running it will
see the flag and skip transmitting, then use cancel_work_sync to wait
for any in-progress execution to finish, and finally purge any
remaining queued skbs.
In the Linux kernel, the following vulnerability has been resolved:
PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry
Endpoint drivers use dw_pcie_ep_raise_msix_irq() to raise an MSI-X
interrupt to the host using a writel(), which generates a PCI posted write
transaction. There's no completion for posted writes, so the writel() may
return before the PCI write completes. dw_pcie_ep_raise_msix_irq() also
unmaps the outbound ATU entry used for the PCI write, so the write races
with the unmap.
If the PCI write loses the race with the ATU unmap, the write may corrupt
host memory or cause IOMMU errors, e.g., these when running fio with a
larger queue depth against nvmet-pci-epf:
arm-smmu-v3 fc900000.iommu: 0x0000010000000010
arm-smmu-v3 fc900000.iommu: 0x0000020000000000
arm-smmu-v3 fc900000.iommu: 0x000000090000f040
arm-smmu-v3 fc900000.iommu: 0x0000000000000000
arm-smmu-v3 fc900000.iommu: event: F_TRANSLATION client: 0000:01:00.0 sid: 0x100 ssid: 0x0 iova: 0x90000f040 ipa: 0x0
arm-smmu-v3 fc900000.iommu: unpriv data write s1 "Input address caused fault" stag: 0x0
Flush the write by performing a readl() of the same address to ensure that
the write has reached the destination before the ATU entry is unmapped.
The same problem was solved for dw_pcie_ep_raise_msi_irq() in commit
8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping"), but there
it was solved by dedicating an outbound iATU only for MSI. We can't do the
same for MSI-X because each vector can have a different msg_addr and the
msg_addr may be changed while the vector is masked.
[bhelgaas: commit log]
In the Linux kernel, the following vulnerability has been resolved:
ksmbd: Compare MACs in constant time
To prevent timing attacks, MAC comparisons need to be constant-time.
Replace the memcmp() with the correct function, crypto_memneq().
In the Linux kernel, the following vulnerability has been resolved:
net: usb: kalmia: validate USB endpoints
The kalmia driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
In the Linux kernel, the following vulnerability has been resolved:
drbd: fix "LOGIC BUG" in drbd_al_begin_io_nonblock()
Even though we check that we "should" be able to do lc_get_cumulative()
while holding the device->al_lock spinlock, it may still fail,
if some other code path decided to do lc_try_lock() with bad timing.
If that happened, we logged "LOGIC BUG for enr=...",
but still did not return an error.
The rest of the code now assumed that this request has references
for the relevant activity log extents.
The implcations are that during an active resync, mutual exclusivity of
resync versus application IO is not guaranteed. And a potential crash
at this point may not realizs that these extents could have been target
of in-flight IO and would need to be resynced just in case.
Also, once the request completes, it will give up activity log references it
does not even hold, which will trigger a BUG_ON(refcnt == 0) in lc_put().
Fix:
Do not crash the kernel for a condition that is harmless during normal
operation: also catch "e->refcnt == 0", not only "e == NULL"
when being noisy about "al_complete_io() called on inactive extent %u\n".
And do not try to be smart and "guess" whether something will work, then
be surprised when it does not.
Deal with the fact that it may or may not work. If it does not, remember a
possible "partially in activity log" state (only possible for requests that
cross extent boundaries), and return an error code from
drbd_al_begin_io_nonblock().
A latter call for the same request will then resume from where we left off.
In the Linux kernel, the following vulnerability has been resolved:
can: mcp251x: fix deadlock in error path of mcp251x_open
The mcp251x_open() function call free_irq() in its error path with the
mpc_lock mutex held. But if an interrupt already occurred the
interrupt handler will be waiting for the mpc_lock and free_irq() will
deadlock waiting for the handler to finish.
This issue is similar to the one fixed in commit 7dd9c26bd6cf ("can:
mcp251x: fix deadlock if an interrupt occurs during mcp251x_open") but
for the error path.
To solve this issue move the call to free_irq() after the lock is
released. Setting `priv->force_quit = 1` beforehand ensure that the IRQ
handler will exit right away once it acquired the lock.
In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix stack-out-of-bounds write in devmap
get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.
Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.
Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.
To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.
In the Linux kernel, the following vulnerability has been resolved:
cxl: Fix race of nvdimm_bus object when creating nvdimm objects
Found issue during running of cxl-translate.sh unit test. Adding a 3s
sleep right before the test seems to make the issue reproduce fairly
consistently. The cxl_translate module has dependency on cxl_acpi and
causes orphaned nvdimm objects to reprobe after cxl_acpi is removed.
The nvdimm_bus object is registered by the cxl_nvb object when
cxl_acpi_probe() is called. With the nvdimm_bus object missing,
__nd_device_register() will trigger NULL pointer dereference when
accessing the dev->parent that points to &nvdimm_bus->dev.
[ 192.884510] BUG: kernel NULL pointer dereference, address: 000000000000006c
[ 192.895383] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20250812-19.fc42 08/12/2025
[ 192.897721] Workqueue: cxl_port cxl_bus_rescan_queue [cxl_core]
[ 192.899459] RIP: 0010:kobject_get+0xc/0x90
[ 192.924871] Call Trace:
[ 192.925959] <TASK>
[ 192.926976] ? pm_runtime_init+0xb9/0xe0
[ 192.929712] __nd_device_register.part.0+0x4d/0xc0 [libnvdimm]
[ 192.933314] __nvdimm_create+0x206/0x290 [libnvdimm]
[ 192.936662] cxl_nvdimm_probe+0x119/0x1d0 [cxl_pmem]
[ 192.940245] cxl_bus_probe+0x1a/0x60 [cxl_core]
[ 192.943349] really_probe+0xde/0x380
This patch also relies on the previous change where
devm_cxl_add_nvdimm_bridge() is called from drivers/cxl/pmem.c instead
of drivers/cxl/core.c to ensure the dependency of cxl_acpi on cxl_pmem.
1. Set probe_type of cxl_nvb to PROBE_FORCE_SYNCHRONOUS to ensure the
driver is probed synchronously when add_device() is called.
2. Add a check in __devm_cxl_add_nvdimm_bridge() to ensure that the
cxl_nvb driver is attached during cxl_acpi_probe().
3. Take the cxl_root uport_dev lock and the cxl_nvb->dev lock in
devm_cxl_add_nvdimm() before checking nvdimm_bus is valid.
4. Set cxl_nvdimm flag to CXL_NVD_F_INVALIDATED so cxl_nvdimm_probe()
will exit with -EBUSY.
The removal of cxl_nvdimm devices should prevent any orphaned devices
from probing once the nvdimm_bus is gone.
[ dj: Fixed 0-day reported kdoc issue. ]
[ dj: Fix cxl_nvb reference leak on error. Gregory (kreview-0811365) ]
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nft_set_pipapo: split gc into unlink and reclaim phase
Yiming Qian reports Use-after-free in the pipapo set type:
Under a large number of expired elements, commit-time GC can run for a very
long time in a non-preemptible context, triggering soft lockup warnings and
RCU stall reports (local denial of service).
We must split GC in an unlink and a reclaim phase.
We cannot queue elements for freeing until pointers have been swapped.
Expired elements are still exposed to both the packet path and userspace
dumpers via the live copy of the data structure.
call_rcu() does not protect us: dump operations or element lookups starting
after call_rcu has fired can still observe the free'd element, unless the
commit phase has made enough progress to swap the clone and live pointers
before any new reader has picked up the old version.
This a similar approach as done recently for the rbtree backend in commit
35f83a75529a ("netfilter: nft_set_rbtree: don't gc elements on insert").
In the Linux kernel, the following vulnerability has been resolved:
x86/efi: defer freeing of boot services memory
efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
and EFI_BOOT_SERVICES_DATA using memblock_free_late().
There are two issue with that: memblock_free_late() should be used for
memory allocated with memblock_alloc() while the memory reserved with
memblock_reserve() should be freed with free_reserved_area().
More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
efi_free_boot_services() is called before deferred initialization of the
memory map is complete.
Benjamin Herrenschmidt reports that this causes a leak of ~140MB of
RAM on EC2 t3a.nano instances which only have 512MB or RAM.
If the freed memory resides in the areas that memory map for them is
still uninitialized, they won't be actually freed because
memblock_free_late() calls memblock_free_pages() and the latter skips
uninitialized pages.
Using free_reserved_area() at this point is also problematic because
__free_page() accesses the buddy of the freed page and that again might
end up in uninitialized part of the memory map.
Delaying the entire efi_free_boot_services() could be problematic
because in addition to freeing boot services memory it updates
efi.memmap without any synchronization and that's undesirable late in
boot when there is concurrency.
More robust approach is to only defer freeing of the EFI boot services
memory.
Split efi_free_boot_services() in two. First efi_unmap_boot_services()
collects ranges that should be freed into an array then
efi_free_boot_services() later frees them after deferred init is complete.