In the Linux kernel, the following vulnerability has been resolved:
parisc: Drop WARN_ON_ONCE() from flush_cache_vmap
I have observed warning to occassionally trigger.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: subpage: keep TOWRITE tag until folio is cleaned
btrfs_subpage_set_writeback() calls folio_start_writeback() the first time
a folio is written back, and it also clears the PAGECACHE_TAG_TOWRITE tag
even if there are still dirty blocks in the folio. This can break ordering
guarantees, such as those required by btrfs_wait_ordered_extents().
That ordering breakage leads to a real failure. For example, running
generic/464 on a zoned setup will hit the following ASSERT. This happens
because the broken ordering fails to flush existing dirty pages before the
file size is truncated.
assertion failed: !list_empty(&ordered->list) :: 0, in fs/btrfs/zoned.c:1899
------------[ cut here ]------------
kernel BUG at fs/btrfs/zoned.c:1899!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 1906169 Comm: kworker/u130:2 Kdump: loaded Not tainted 6.16.0-rc6-BTRFS-ZNS+ #554 PREEMPT(voluntary)
Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
RIP: 0010:btrfs_finish_ordered_zoned.cold+0x50/0x52 [btrfs]
RSP: 0018:ffffc9002efdbd60 EFLAGS: 00010246
RAX: 000000000000004c RBX: ffff88811923c4e0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff827e38b1 RDI: 00000000ffffffff
RBP: ffff88810005d000 R08: 00000000ffffdfff R09: ffffffff831051c8
R10: ffffffff83055220 R11: 0000000000000000 R12: ffff8881c2458c00
R13: ffff88811923c540 R14: ffff88811923c5e8 R15: ffff8881c1bd9680
FS: 0000000000000000(0000) GS:ffff88a04acd0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f907c7a918c CR3: 0000000004024000 CR4: 0000000000350ef0
Call Trace:
<TASK>
? srso_return_thunk+0x5/0x5f
btrfs_finish_ordered_io+0x4a/0x60 [btrfs]
btrfs_work_helper+0xf9/0x490 [btrfs]
process_one_work+0x204/0x590
? srso_return_thunk+0x5/0x5f
worker_thread+0x1d6/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x118/0x230
? __pfx_kthread+0x10/0x10
ret_from_fork+0x205/0x260
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Consider process A calling writepages() with WB_SYNC_NONE. In zoned mode or
for compressed writes, it locks several folios for delalloc and starts
writing them out. Let's call the last locked folio folio X. Suppose the
write range only partially covers folio X, leaving some pages dirty.
Process A calls btrfs_subpage_set_writeback() when building a bio. This
function call clears the TOWRITE tag of folio X, whose size = 8K and
the block size = 4K. It is following state.
0 4K 8K
|/////|/////| (flag: DIRTY, tag: DIRTY)
<-----> Process A will write this range.
Now suppose process B concurrently calls writepages() with WB_SYNC_ALL. It
calls tag_pages_for_writeback() to tag dirty folios with
PAGECACHE_TAG_TOWRITE. Since folio X is still dirty, it gets tagged. Then,
B collects tagged folios using filemap_get_folios_tag() and must wait for
folio X to be written before returning from writepages().
0 4K 8K
|/////|/////| (flag: DIRTY, tag: DIRTY|TOWRITE)
However, between tagging and collecting, process A may call
btrfs_subpage_set_writeback() and clear folio X's TOWRITE tag.
0 4K 8K
| |/////| (flag: DIRTY|WRITEBACK, tag: DIRTY)
As a result, process B won't see folio X in its batch, and returns without
waiting for it. This breaks the WB_SYNC_ALL ordering requirement.
Fix this by using btrfs_subpage_set_writeback_keepwrite(), which retains
the TOWRITE tag. We now manually clear the tag only after the folio becomes
clean, via the xas operation.
In the Linux kernel, the following vulnerability has been resolved:
LoongArch: Optimize module load time by optimizing PLT/GOT counting
When enabling CONFIG_KASAN, CONFIG_PREEMPT_VOLUNTARY_BUILD and
CONFIG_PREEMPT_VOLUNTARY at the same time, there will be soft deadlock,
the relevant logs are as follows:
rcu: INFO: rcu_sched self-detected stall on CPU
...
Call Trace:
[<900000000024f9e4>] show_stack+0x5c/0x180
[<90000000002482f4>] dump_stack_lvl+0x94/0xbc
[<9000000000224544>] rcu_dump_cpu_stacks+0x1fc/0x280
[<900000000037ac80>] rcu_sched_clock_irq+0x720/0xf88
[<9000000000396c34>] update_process_times+0xb4/0x150
[<90000000003b2474>] tick_nohz_handler+0xf4/0x250
[<9000000000397e28>] __hrtimer_run_queues+0x1d0/0x428
[<9000000000399b2c>] hrtimer_interrupt+0x214/0x538
[<9000000000253634>] constant_timer_interrupt+0x64/0x80
[<9000000000349938>] __handle_irq_event_percpu+0x78/0x1a0
[<9000000000349a78>] handle_irq_event_percpu+0x18/0x88
[<9000000000354c00>] handle_percpu_irq+0x90/0xf0
[<9000000000348c74>] handle_irq_desc+0x94/0xb8
[<9000000001012b28>] handle_cpu_irq+0x68/0xa0
[<9000000001def8c0>] handle_loongarch_irq+0x30/0x48
[<9000000001def958>] do_vint+0x80/0xd0
[<9000000000268a0c>] kasan_mem_to_shadow.part.0+0x2c/0x2a0
[<90000000006344f4>] __asan_load8+0x4c/0x120
[<900000000025c0d0>] module_frob_arch_sections+0x5c8/0x6b8
[<90000000003895f0>] load_module+0x9e0/0x2958
[<900000000038b770>] __do_sys_init_module+0x208/0x2d0
[<9000000001df0c34>] do_syscall+0x94/0x190
[<900000000024d6fc>] handle_syscall+0xbc/0x158
After analysis, this is because the slow speed of loading the amdgpu
module leads to the long time occupation of the cpu and then the soft
deadlock.
When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs/GOTs that will be needed to handle all the RELAs. It
will call the count_max_entries() to find in an out-of-order date which
counting algorithm has O(n^2) complexity.
To make it faster, we sort the relocation list by info and addend. That
way, to check for a duplicate relocation, it just needs to compare with
the previous entry. This reduces the complexity of the algorithm to O(n
log n), as done in commit d4e0340919fb ("arm64/module: Optimize module
load time by optimizing PLT counting"). This gives sinificant reduction
in module load time for modules with large number of relocations.
After applying this patch, the soft deadlock problem has been solved,
and the kernel starts normally without "Call Trace".
Using the default configuration to test some modules, the results are as
follows:
Module Size
ip_tables 36K
fat 143K
radeon 2.5MB
amdgpu 16MB
Without this patch:
Module Module load time (ms) Count(PLTs/GOTs)
ip_tables 18 59/6
fat 0 162/14
radeon 54 1221/84
amdgpu 1411 4525/1098
With this patch:
Module Module load time (ms) Count(PLTs/GOTs)
ip_tables 18 59/6
fat 0 162/14
radeon 22 1221/84
amdgpu 45 4525/1098
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: add null check
[WHY]
Prevents null pointer dereferences to enhance function robustness
[HOW]
Adds early null check and return false if invalid.
In the Linux kernel, the following vulnerability has been resolved:
ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered
If a synchronous error is detected as a result of user-space process
triggering a 2-bit uncorrected error, the CPU will take a synchronous
error exception such as Synchronous External Abort (SEA) on Arm64. The
kernel will queue a memory_failure() work which poisons the related
page, unmaps the page, and then sends a SIGBUS to the process, so that
a system wide panic can be avoided.
However, no memory_failure() work will be queued when abnormal
synchronous errors occur. These errors can include situations like
invalid PA, unexpected severity, no memory failure config support,
invalid GUID section, etc. In such a case, the user-space process will
trigger SEA again. This loop can potentially exceed the platform
firmware threshold or even trigger a kernel hard lockup, leading to a
system reboot.
Fix it by performing a force kill if no memory_failure() work is queued
for synchronous errors.
[ rjw: Changelog edits ]
In the Linux kernel, the following vulnerability has been resolved:
netfilter: ctnetlink: remove refcounting in expectation dumpers
Same pattern as previous patch: do not keep the expectation object
alive via refcount, only store a cookie value and then use that
as the skip hint for dump resumption.
AFAICS this has the same issue as the one resolved in the conntrack
dumper, when we do
if (!refcount_inc_not_zero(&exp->use))
to increment the refcount, there is a chance that exp == last, which
causes a double-increment of the refcount and subsequent memory leak.
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops
Clears up the warning added in 7ee3647243e5 ("migrate: Remove call to
->writepage") that occurs in various xfstests, causing "something found
in dmesg" failures.
[ 341.136573] gfs2_meta_aops does not implement migrate_folio
[ 341.136953] WARNING: CPU: 1 PID: 36 at mm/migrate.c:944 move_to_new_folio+0x2f8/0x300
In the Linux kernel, the following vulnerability has been resolved:
mm/smaps: fix race between smaps_hugetlb_range and migration
smaps_hugetlb_range() handles the pte without holdling ptl, and may be
concurrenct with migration, leaing to BUG_ON in pfn_swap_entry_to_page().
The race is as follows.
smaps_hugetlb_range migrate_pages
huge_ptep_get
remove_migration_ptes
folio_unlock
pfn_swap_entry_folio
BUG_ON
To fix it, hold ptl lock in smaps_hugetlb_range().
In the Linux kernel, the following vulnerability has been resolved:
drm/msm: Add error handling for krealloc in metadata setup
Function msm_ioctl_gem_info_set_metadata() now checks for krealloc
failure and returns -ENOMEM, avoiding potential NULL pointer dereference.
Explicitly avoids __GFP_NOFAIL due to deadlock risks and allocation constraints.
Patchwork: https://patchwork.freedesktop.org/patch/661235/
In the Linux kernel, the following vulnerability has been resolved:
bpf: Forget ranges when refining tnum after JSET
Syzbot reported a kernel warning due to a range invariant violation on
the following BPF program.
0: call bpf_get_netns_cookie
1: if r0 == 0 goto <exit>
2: if r0 & Oxffffffff goto <exit>
The issue is on the path where we fall through both jumps.
That path is unreachable at runtime: after insn 1, we know r0 != 0, but
with the sign extension on the jset, we would only fallthrough insn 2
if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
figure this out, so the verifier walks all branches. The verifier then
refines the register bounds using the second condition and we end
up with inconsistent bounds on this unreachable path:
1: if r0 == 0 goto <exit>
r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
2: if r0 & 0xffffffff goto <exit>
r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
r0 after reg_bounds_sync: u64=[0x1, 0] var_off=(0, 0)
Improving the range refinement for JSET to cover all cases is tricky. We
also don't expect many users to rely on JSET given LLVM doesn't generate
those instructions. So instead of improving the range refinement for
JSETs, Eduard suggested we forget the ranges whenever we're narrowing
tnums after a JSET. This patch implements that approach.