In the Linux kernel, the following vulnerability has been resolved:
wifi: brcmfmac: fix use-after-free when rescheduling brcmf_btcoex_info work
The brcmf_btcoex_detach() only shuts down the btcoex timer, if the
flag timer_on is false. However, the brcmf_btcoex_timerfunc(), which
runs as timer handler, sets timer_on to false. This creates critical
race conditions:
1.If brcmf_btcoex_detach() is called while brcmf_btcoex_timerfunc()
is executing, it may observe timer_on as false and skip the call to
timer_shutdown_sync().
2.The brcmf_btcoex_timerfunc() may then reschedule the brcmf_btcoex_info
worker after the cancel_work_sync() has been executed, resulting in
use-after-free bugs.
The use-after-free bugs occur in two distinct scenarios, depending on
the timing of when the brcmf_btcoex_info struct is freed relative to
the execution of its worker thread.
Scenario 1: Freed before the worker is scheduled
The brcmf_btcoex_info is deallocated before the worker is scheduled.
A race condition can occur when schedule_work(&bt_local->work) is
called after the target memory has been freed. The sequence of events
is detailed below:
CPU0 | CPU1
brcmf_btcoex_detach | brcmf_btcoex_timerfunc
| bt_local->timer_on = false;
if (cfg->btcoex->timer_on) |
... |
cancel_work_sync(); |
... |
kfree(cfg->btcoex); // FREE |
| schedule_work(&bt_local->work); // USE
Scenario 2: Freed after the worker is scheduled
The brcmf_btcoex_info is freed after the worker has been scheduled
but before or during its execution. In this case, statements within
the brcmf_btcoex_handler() — such as the container_of macro and
subsequent dereferences of the brcmf_btcoex_info object will cause
a use-after-free access. The following timeline illustrates this
scenario:
CPU0 | CPU1
brcmf_btcoex_detach | brcmf_btcoex_timerfunc
| bt_local->timer_on = false;
if (cfg->btcoex->timer_on) |
... |
cancel_work_sync(); |
... | schedule_work(); // Reschedule
|
kfree(cfg->btcoex); // FREE | brcmf_btcoex_handler() // Worker
/* | btci = container_of(....); // USE
The kfree() above could | ...
also occur at any point | btci-> // USE
during the worker's execution|
*/ |
To resolve the race conditions, drop the conditional check and call
timer_shutdown_sync() directly. It can deactivate the timer reliably,
regardless of its current state. Once stopped, the timer_on state is
then set to false.
In the Linux kernel, the following vulnerability has been resolved:
wifi: cfg80211: fix use-after-free in cmp_bss()
Following bss_free() quirk introduced in commit 776b3580178f
("cfg80211: track hidden SSID networks properly"), adjust
cfg80211_update_known_bss() to free the last beacon frame
elements only if they're not shared via the corresponding
'hidden_beacon_bss' pointer.
In the Linux kernel, the following vulnerability has been resolved:
fs: writeback: fix use-after-free in __mark_inode_dirty()
An use-after-free issue occurred when __mark_inode_dirty() get the
bdi_writeback that was in the progress of switching.
CPU: 1 PID: 562 Comm: systemd-random- Not tainted 6.6.56-gb4403bd46a8e #1
......
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __mark_inode_dirty+0x124/0x418
lr : __mark_inode_dirty+0x118/0x418
sp : ffffffc08c9dbbc0
........
Call trace:
__mark_inode_dirty+0x124/0x418
generic_update_time+0x4c/0x60
file_modified+0xcc/0xd0
ext4_buffered_write_iter+0x58/0x124
ext4_file_write_iter+0x54/0x704
vfs_write+0x1c0/0x308
ksys_write+0x74/0x10c
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0xc0/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x40/0xe4
el0t_64_sync_handler+0x120/0x12c
el0t_64_sync+0x194/0x198
Root cause is:
systemd-random-seed kworker
----------------------------------------------------------------------
___mark_inode_dirty inode_switch_wbs_work_fn
spin_lock(&inode->i_lock);
inode_attach_wb
locked_inode_to_wb_and_lock_list
get inode->i_wb
spin_unlock(&inode->i_lock);
spin_lock(&wb->list_lock)
spin_lock(&inode->i_lock)
inode_io_list_move_locked
spin_unlock(&wb->list_lock)
spin_unlock(&inode->i_lock)
spin_lock(&old_wb->list_lock)
inode_do_switch_wbs
spin_lock(&inode->i_lock)
inode->i_wb = new_wb
spin_unlock(&inode->i_lock)
spin_unlock(&old_wb->list_lock)
wb_put_many(old_wb, nr_switched)
cgwb_release
old wb released
wb_wakeup_delayed() accesses wb,
then trigger the use-after-free
issue
Fix this race condition by holding inode spinlock until
wb_wakeup_delayed() finished.
In the Linux kernel, the following vulnerability has been resolved:
i40e: Fix potential invalid access when MAC list is empty
list_first_entry() never returns NULL - if the list is empty, it still
returns a pointer to an invalid object, leading to potential invalid
memory access when dereferenced.
Fix this by using list_first_entry_or_null instead of list_first_entry.
In the Linux kernel, the following vulnerability has been resolved:
ptp: ocp: fix use-after-free bugs causing by ptp_ocp_watchdog
The ptp_ocp_detach() only shuts down the watchdog timer if it is
pending. However, if the timer handler is already running, the
timer_delete_sync() is not called. This leads to race conditions
where the devlink that contains the ptp_ocp is deallocated while
the timer handler is still accessing it, resulting in use-after-free
bugs. The following details one of the race scenarios.
(thread 1) | (thread 2)
ptp_ocp_remove() |
ptp_ocp_detach() | ptp_ocp_watchdog()
if (timer_pending(&bp->watchdog))| bp = timer_container_of()
timer_delete_sync() |
|
devlink_free(devlink) //free |
| bp-> //use
Resolve this by unconditionally calling timer_delete_sync() to ensure
the timer is reliably deactivated, preventing any access after free.
In the Linux kernel, the following vulnerability has been resolved:
mm: move page table sync declarations to linux/pgtable.h
During our internal testing, we started observing intermittent boot
failures when the machine uses 4-level paging and has a large amount of
persistent memory:
BUG: unable to handle page fault for address: ffffe70000000034
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
RIP: 0010:__init_single_page+0x9/0x6d
Call Trace:
<TASK>
__init_zone_device_page+0x17/0x5d
memmap_init_zone_device+0x154/0x1bb
pagemap_range+0x2e0/0x40f
memremap_pages+0x10b/0x2f0
devm_memremap_pages+0x1e/0x60
dev_dax_probe+0xce/0x2ec [device_dax]
dax_bus_probe+0x6d/0xc9
[... snip ...]
</TASK>
It turns out that the kernel panics while initializing vmemmap (struct
page array) when the vmemmap region spans two PGD entries, because the new
PGD entry is only installed in init_mm.pgd, but not in the page tables of
other tasks.
And looking at __populate_section_memmap():
if (vmemmap_can_optimize(altmap, pgmap))
// does not sync top level page tables
r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
else
// sync top level page tables in x86
r = vmemmap_populate(start, end, nid, altmap);
In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c
synchronizes the top level page table (See commit 9b861528a801 ("x86-64,
mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so
that all tasks in the system can see the new vmemmap area.
However, when vmemmap_can_optimize() returns true, the optimized path
skips synchronization of top-level page tables. This is because
vmemmap_populate_compound_pages() is implemented in core MM code, which
does not handle synchronization of the top-level page tables. Instead,
the core MM has historically relied on each architecture to perform this
synchronization manually.
We're not the first party to encounter a crash caused by not-sync'd top
level page tables: earlier this year, Gwan-gyeong Mun attempted to address
the issue [1] [2] after hitting a kernel panic when x86 code accessed the
vmemmap area before the corresponding top-level entries were synced. At
that time, the issue was believed to be triggered only when struct page
was enlarged for debugging purposes, and the patch did not get further
updates.
It turns out that current approach of relying on each arch to handle the
page table sync manually is fragile because 1) it's easy to forget to sync
the top level page table, and 2) it's also easy to overlook that the
kernel should not access the vmemmap and direct mapping areas before the
sync.
# The solution: Make page table sync more code robust and harder to miss
To address this, Dave Hansen suggested [3] [4] introducing
{pgd,p4d}_populate_kernel() for updating kernel portion of the page tables
and allow each architecture to explicitly perform synchronization when
installing top-level entries. With this approach, we no longer need to
worry about missing the sync step, reducing the risk of future
regressions.
The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK,
PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by
vmalloc and ioremap to synchronize page tables.
pgd_populate_kernel() looks like this:
static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
p4d_t *p4d)
{
pgd_populate(&init_mm, pgd, p4d);
if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
arch_sync_kernel_mappings(addr, addr);
}
It is worth noting that vmalloc() and apply_to_range() carefully
synchronizes page tables by calling p*d_alloc_track() and
arch_sync_kernel_mappings(), and thus they are not affected by
---truncated---
In the Linux kernel, the following vulnerability has been resolved:
x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
Define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to ensure
page tables are properly synchronized when calling p*d_populate_kernel().
For 5-level paging, synchronization is performed via
pgd_populate_kernel(). In 4-level paging, pgd_populate() is a no-op, so
synchronization is instead performed at the P4D level via
p4d_populate_kernel().
This fixes intermittent boot failures on systems using 4-level paging and
a large amount of persistent memory:
BUG: unable to handle page fault for address: ffffe70000000034
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
RIP: 0010:__init_single_page+0x9/0x6d
Call Trace:
<TASK>
__init_zone_device_page+0x17/0x5d
memmap_init_zone_device+0x154/0x1bb
pagemap_range+0x2e0/0x40f
memremap_pages+0x10b/0x2f0
devm_memremap_pages+0x1e/0x60
dev_dax_probe+0xce/0x2ec [device_dax]
dax_bus_probe+0x6d/0xc9
[... snip ...]
</TASK>
It also fixes a crash in vmemmap_set_pmd() caused by accessing vmemmap
before sync_global_pgds() [1]:
BUG: unable to handle page fault for address: ffffeb3ff1200000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
Tainted: [W]=WARN
RIP: 0010:vmemmap_set_pmd+0xff/0x230
<TASK>
vmemmap_populate_hugepages+0x176/0x180
vmemmap_populate+0x34/0x80
__populate_section_memmap+0x41/0x90
sparse_add_section+0x121/0x3e0
__add_pages+0xba/0x150
add_pages+0x1d/0x70
memremap_pages+0x3dc/0x810
devm_memremap_pages+0x1c/0x60
xe_devm_add+0x8b/0x100 [xe]
xe_tile_init_noalloc+0x6a/0x70 [xe]
xe_device_probe+0x48c/0x740 [xe]
[... snip ...]
In the Linux kernel, the following vulnerability has been resolved:
pcmcia: Fix a NULL pointer dereference in __iodyn_find_io_region()
In __iodyn_find_io_region(), pcmcia_make_resource() is assigned to
res and used in pci_bus_alloc_resource(). There is a dereference of res
in pci_bus_alloc_resource(), which could lead to a NULL pointer
dereference on failure of pcmcia_make_resource().
Fix this bug by adding a check of res.
In the Linux kernel, the following vulnerability has been resolved:
ppp: fix memory leak in pad_compress_skb
If alloc_skb() fails in pad_compress_skb(), it returns NULL without
releasing the old skb. The caller does:
skb = pad_compress_skb(ppp, skb);
if (!skb)
goto drop;
drop:
kfree_skb(skb);
When pad_compress_skb() returns NULL, the reference to the old skb is
lost and kfree_skb(skb) ends up doing nothing, leading to a memory leak.
Align pad_compress_skb() semantics with realloc(): only free the old
skb if allocation and compression succeed. At the call site, use the
new_skb variable so the original skb is not lost when pad_compress_skb()
fails.