In the Linux kernel, the following vulnerability has been resolved:
bpf: Forget ranges when refining tnum after JSET
Syzbot reported a kernel warning due to a range invariant violation on
the following BPF program.
0: call bpf_get_netns_cookie
1: if r0 == 0 goto <exit>
2: if r0 & Oxffffffff goto <exit>
The issue is on the path where we fall through both jumps.
That path is unreachable at runtime: after insn 1, we know r0 != 0, but
with the sign extension on the jset, we would only fallthrough insn 2
if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
figure this out, so the verifier walks all branches. The verifier then
refines the register bounds using the second condition and we end
up with inconsistent bounds on this unreachable path:
1: if r0 == 0 goto <exit>
r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
2: if r0 & 0xffffffff goto <exit>
r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
r0 after reg_bounds_sync: u64=[0x1, 0] var_off=(0, 0)
Improving the range refinement for JSET to cover all cases is tricky. We
also don't expect many users to rely on JSET given LLVM doesn't generate
those instructions. So instead of improving the range refinement for
JSETs, Eduard suggested we forget the ranges whenever we're narrowing
tnums after a JSET. This patch implements that approach.
In the Linux kernel, the following vulnerability has been resolved:
jfs: truncate good inode pages when hard link is 0
The fileset value of the inode copy from the disk by the reproducer is
AGGR_RESERVED_I. When executing evict, its hard link number is 0, so its
inode pages are not truncated. This causes the bugon to be triggered when
executing clear_inode() because nrpages is greater than 0.
In the Linux kernel, the following vulnerability has been resolved:
rcu: Fix rcu_read_unlock() deadloop due to IRQ work
During rcu_read_unlock_special(), if this happens during irq_exit(), we
can lockup if an IPI is issued. This is because the IPI itself triggers
the irq_exit() path causing a recursive lock up.
This is precisely what Xiongfeng found when invoking a BPF program on
the trace_tick_stop() tracepoint As shown in the trace below. Fix by
managing the irq_work state correctly.
irq_exit()
__irq_exit_rcu()
/* in_hardirq() returns false after this */
preempt_count_sub(HARDIRQ_OFFSET)
tick_irq_exit()
tick_nohz_irq_exit()
tick_nohz_stop_sched_tick()
trace_tick_stop() /* a bpf prog is hooked on this trace point */
__bpf_trace_tick_stop()
bpf_trace_run2()
rcu_read_unlock_special()
/* will send a IPI to itself */
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
A simple reproducer can also be obtained by doing the following in
tick_irq_exit(). It will hang on boot without the patch:
static inline void tick_irq_exit(void)
{
+ rcu_read_lock();
+ WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
+ rcu_read_unlock();
+
[neeraj: Apply Frederic's suggested fix for PREEMPT_RT]
In the Linux kernel, the following vulnerability has been resolved:
rcutorture: Fix rcutorture_one_extend_check() splat in RT kernels
For built with CONFIG_PREEMPT_RT=y kernels, running rcutorture
tests resulted in the following splat:
[ 68.797425] rcutorture_one_extend_check during change: Current 0x1 To add 0x1 To remove 0x0 preempt_count() 0x0
[ 68.797533] WARNING: CPU: 2 PID: 512 at kernel/rcu/rcutorture.c:1993 rcutorture_one_extend_check+0x419/0x560 [rcutorture]
[ 68.797601] Call Trace:
[ 68.797602] <TASK>
[ 68.797619] ? lockdep_softirqs_off+0xa5/0x160
[ 68.797631] rcutorture_one_extend+0x18e/0xcc0 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797646] ? local_clock+0x19/0x40
[ 68.797659] rcu_torture_one_read+0xf0/0x280 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797678] ? __pfx_rcu_torture_one_read+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797804] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797815] rcu-torture: rcu_torture_reader task started
[ 68.797824] rcu-torture: Creating rcu_torture_reader task
[ 68.797824] rcu_torture_reader+0x238/0x580 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797836] ? kvm_sched_clock_read+0x15/0x30
Disable BH does not change the SOFTIRQ corresponding bits in
preempt_count() for RT kernels, this commit therefore use
softirq_count() to check the if BH is disabled.
In the Linux kernel, the following vulnerability has been resolved:
parisc: Revise __get_user() to probe user read access
Because of the way read access support is implemented, read access
interruptions are only triggered at privilege levels 2 and 3. The
kernel executes at privilege level 0, so __get_user() never triggers
a read access interruption (code 26). Thus, it is currently possible
for user code to access a read protected address via a system call.
Fix this by probing read access rights at privilege level 3 (PRIV_USER)
and setting __gu_err to -EFAULT (-14) if access isn't allowed.
Note the cmpiclr instruction does a 32-bit compare because COND macro
doesn't work inside asm.
In the Linux kernel, the following vulnerability has been resolved:
comedi: Fix use of uninitialized memory in do_insn_ioctl() and do_insnlist_ioctl()
syzbot reports a KMSAN kernel-infoleak in `do_insn_ioctl()`. A kernel
buffer is allocated to hold `insn->n` samples (each of which is an
`unsigned int`). For some instruction types, `insn->n` samples are
copied back to user-space, unless an error code is being returned. The
problem is that not all the instruction handlers that need to return
data to userspace fill in the whole `insn->n` samples, so that there is
an information leak. There is a similar syzbot report for
`do_insnlist_ioctl()`, although it does not have a reproducer for it at
the time of writing.
One culprit is `insn_rw_emulate_bits()` which is used as the handler for
`INSN_READ` or `INSN_WRITE` instructions for subdevices that do not have
a specific handler for that instruction, but do have an `INSN_BITS`
handler. For `INSN_READ` it only fills in at most 1 sample, so if
`insn->n` is greater than 1, the remaining `insn->n - 1` samples copied
to userspace will be uninitialized kernel data.
Another culprit is `vm80xx_ai_insn_read()` in the "vm80xx" driver. It
never returns an error, even if it fails to fill the buffer.
Fix it in `do_insn_ioctl()` and `do_insnlist_ioctl()` by making sure
that uninitialized parts of the allocated buffer are zeroed before
handling each instruction.
Thanks to Arnaud Lecomte for their fix to `do_insn_ioctl()`. That fix
replaced the call to `kmalloc_array()` with `kcalloc()`, but it is not
always necessary to clear the whole buffer.
In the Linux kernel, the following vulnerability has been resolved:
comedi: Make insn_rw_emulate_bits() do insn->n samples
The `insn_rw_emulate_bits()` function is used as a default handler for
`INSN_READ` instructions for subdevices that have a handler for
`INSN_BITS` but not for `INSN_READ`. Similarly, it is used as a default
handler for `INSN_WRITE` instructions for subdevices that have a handler
for `INSN_BITS` but not for `INSN_WRITE`. It works by emulating the
`INSN_READ` or `INSN_WRITE` instruction handling with a constructed
`INSN_BITS` instruction. However, `INSN_READ` and `INSN_WRITE`
instructions are supposed to be able read or write multiple samples,
indicated by the `insn->n` value, but `insn_rw_emulate_bits()` currently
only handles a single sample. For `INSN_READ`, the comedi core will
copy `insn->n` samples back to user-space. (That triggered KASAN
kernel-infoleak errors when `insn->n` was greater than 1, but that is
being fixed more generally elsewhere in the comedi core.)
Make `insn_rw_emulate_bits()` either handle `insn->n` samples, or return
an error, to conform to the general expectation for `INSN_READ` and
`INSN_WRITE` handlers.
In the Linux kernel, the following vulnerability has been resolved:
fs/buffer: fix use-after-free when call bh_read() helper
There's issue as follows:
BUG: KASAN: stack-out-of-bounds in end_buffer_read_sync+0xe3/0x110
Read of size 8 at addr ffffc9000168f7f8 by task swapper/3/0
CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.16.0-862.14.0.6.x86_64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_address_description.constprop.0+0x2c/0x390
print_report+0xb4/0x270
kasan_report+0xb8/0xf0
end_buffer_read_sync+0xe3/0x110
end_bio_bh_io_sync+0x56/0x80
blk_update_request+0x30a/0x720
scsi_end_request+0x51/0x2b0
scsi_io_completion+0xe3/0x480
? scsi_device_unbusy+0x11e/0x160
blk_complete_reqs+0x7b/0x90
handle_softirqs+0xef/0x370
irq_exit_rcu+0xa5/0xd0
sysvec_apic_timer_interrupt+0x6e/0x90
</IRQ>
Above issue happens when do ntfs3 filesystem mount, issue may happens
as follows:
mount IRQ
ntfs_fill_super
read_cache_page
do_read_cache_folio
filemap_read_folio
mpage_read_folio
do_mpage_readpage
ntfs_get_block_vbo
bh_read
submit_bh
wait_on_buffer(bh);
blk_complete_reqs
scsi_io_completion
scsi_end_request
blk_update_request
end_bio_bh_io_sync
end_buffer_read_sync
__end_buffer_read_notouch
unlock_buffer
wait_on_buffer(bh);--> return will return to caller
put_bh
--> trigger stack-out-of-bounds
In the mpage_read_folio() function, the stack variable 'map_bh' is
passed to ntfs_get_block_vbo(). Once unlock_buffer() unlocks and
wait_on_buffer() returns to continue processing, the stack variable
is likely to be reclaimed. Consequently, during the end_buffer_read_sync()
process, calling put_bh() may result in stack overrun.
If the bh is not allocated on the stack, it belongs to a folio. Freeing
a buffer head which belongs to a folio is done by drop_buffers() which
will fail to free buffers which are still locked. So it is safe to call
put_bh() before __end_buffer_read_notouch().
In the Linux kernel, the following vulnerability has been resolved:
netfilter: ctnetlink: fix refcount leak on table dump
There is a reference count leak in ctnetlink_dump_table():
if (res < 0) {
nf_conntrack_get(&ct->ct_general); // HERE
cb->args[1] = (unsigned long)ct;
...
While its very unlikely, its possible that ct == last.
If this happens, then the refcount of ct was already incremented.
This 2nd increment is never undone.
This prevents the conntrack object from being released, which in turn
keeps prevents cnet->count from dropping back to 0.
This will then block the netns dismantle (or conntrack rmmod) as
nf_conntrack_cleanup_net_list() will wait forever.
This can be reproduced by running conntrack_resize.sh selftest in a loop.
It takes ~20 minutes for me on a preemptible kernel on average before
I see a runaway kworker spinning in nf_conntrack_cleanup_net_list.
One fix would to change this to:
if (res < 0) {
if (ct != last)
nf_conntrack_get(&ct->ct_general);
But this reference counting isn't needed in the first place.
We can just store a cookie value instead.
A followup patch will do the same for ctnetlink_exp_dump_table,
it looks to me as if this has the same problem and like
ctnetlink_dump_table, we only need a 'skip hint', not the actual
object so we can apply the same cookie strategy there as well.