In the Linux kernel, the following vulnerability has been resolved:
rcu: Fix rcu_read_unlock() deadloop due to IRQ work
During rcu_read_unlock_special(), if this happens during irq_exit(), we
can lockup if an IPI is issued. This is because the IPI itself triggers
the irq_exit() path causing a recursive lock up.
This is precisely what Xiongfeng found when invoking a BPF program on
the trace_tick_stop() tracepoint As shown in the trace below. Fix by
managing the irq_work state correctly.
irq_exit()
__irq_exit_rcu()
/* in_hardirq() returns false after this */
preempt_count_sub(HARDIRQ_OFFSET)
tick_irq_exit()
tick_nohz_irq_exit()
tick_nohz_stop_sched_tick()
trace_tick_stop() /* a bpf prog is hooked on this trace point */
__bpf_trace_tick_stop()
bpf_trace_run2()
rcu_read_unlock_special()
/* will send a IPI to itself */
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
A simple reproducer can also be obtained by doing the following in
tick_irq_exit(). It will hang on boot without the patch:
static inline void tick_irq_exit(void)
{
+ rcu_read_lock();
+ WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
+ rcu_read_unlock();
+
[neeraj: Apply Frederic's suggested fix for PREEMPT_RT]
In the Linux kernel, the following vulnerability has been resolved:
rcutorture: Fix rcutorture_one_extend_check() splat in RT kernels
For built with CONFIG_PREEMPT_RT=y kernels, running rcutorture
tests resulted in the following splat:
[ 68.797425] rcutorture_one_extend_check during change: Current 0x1 To add 0x1 To remove 0x0 preempt_count() 0x0
[ 68.797533] WARNING: CPU: 2 PID: 512 at kernel/rcu/rcutorture.c:1993 rcutorture_one_extend_check+0x419/0x560 [rcutorture]
[ 68.797601] Call Trace:
[ 68.797602] <TASK>
[ 68.797619] ? lockdep_softirqs_off+0xa5/0x160
[ 68.797631] rcutorture_one_extend+0x18e/0xcc0 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797646] ? local_clock+0x19/0x40
[ 68.797659] rcu_torture_one_read+0xf0/0x280 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797678] ? __pfx_rcu_torture_one_read+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797804] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797815] rcu-torture: rcu_torture_reader task started
[ 68.797824] rcu-torture: Creating rcu_torture_reader task
[ 68.797824] rcu_torture_reader+0x238/0x580 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797836] ? kvm_sched_clock_read+0x15/0x30
Disable BH does not change the SOFTIRQ corresponding bits in
preempt_count() for RT kernels, this commit therefore use
softirq_count() to check the if BH is disabled.
In the Linux kernel, the following vulnerability has been resolved:
parisc: Revise __get_user() to probe user read access
Because of the way read access support is implemented, read access
interruptions are only triggered at privilege levels 2 and 3. The
kernel executes at privilege level 0, so __get_user() never triggers
a read access interruption (code 26). Thus, it is currently possible
for user code to access a read protected address via a system call.
Fix this by probing read access rights at privilege level 3 (PRIV_USER)
and setting __gu_err to -EFAULT (-14) if access isn't allowed.
Note the cmpiclr instruction does a 32-bit compare because COND macro
doesn't work inside asm.
In the Linux kernel, the following vulnerability has been resolved:
comedi: Fix use of uninitialized memory in do_insn_ioctl() and do_insnlist_ioctl()
syzbot reports a KMSAN kernel-infoleak in `do_insn_ioctl()`. A kernel
buffer is allocated to hold `insn->n` samples (each of which is an
`unsigned int`). For some instruction types, `insn->n` samples are
copied back to user-space, unless an error code is being returned. The
problem is that not all the instruction handlers that need to return
data to userspace fill in the whole `insn->n` samples, so that there is
an information leak. There is a similar syzbot report for
`do_insnlist_ioctl()`, although it does not have a reproducer for it at
the time of writing.
One culprit is `insn_rw_emulate_bits()` which is used as the handler for
`INSN_READ` or `INSN_WRITE` instructions for subdevices that do not have
a specific handler for that instruction, but do have an `INSN_BITS`
handler. For `INSN_READ` it only fills in at most 1 sample, so if
`insn->n` is greater than 1, the remaining `insn->n - 1` samples copied
to userspace will be uninitialized kernel data.
Another culprit is `vm80xx_ai_insn_read()` in the "vm80xx" driver. It
never returns an error, even if it fails to fill the buffer.
Fix it in `do_insn_ioctl()` and `do_insnlist_ioctl()` by making sure
that uninitialized parts of the allocated buffer are zeroed before
handling each instruction.
Thanks to Arnaud Lecomte for their fix to `do_insn_ioctl()`. That fix
replaced the call to `kmalloc_array()` with `kcalloc()`, but it is not
always necessary to clear the whole buffer.
In the Linux kernel, the following vulnerability has been resolved:
comedi: Make insn_rw_emulate_bits() do insn->n samples
The `insn_rw_emulate_bits()` function is used as a default handler for
`INSN_READ` instructions for subdevices that have a handler for
`INSN_BITS` but not for `INSN_READ`. Similarly, it is used as a default
handler for `INSN_WRITE` instructions for subdevices that have a handler
for `INSN_BITS` but not for `INSN_WRITE`. It works by emulating the
`INSN_READ` or `INSN_WRITE` instruction handling with a constructed
`INSN_BITS` instruction. However, `INSN_READ` and `INSN_WRITE`
instructions are supposed to be able read or write multiple samples,
indicated by the `insn->n` value, but `insn_rw_emulate_bits()` currently
only handles a single sample. For `INSN_READ`, the comedi core will
copy `insn->n` samples back to user-space. (That triggered KASAN
kernel-infoleak errors when `insn->n` was greater than 1, but that is
being fixed more generally elsewhere in the comedi core.)
Make `insn_rw_emulate_bits()` either handle `insn->n` samples, or return
an error, to conform to the general expectation for `INSN_READ` and
`INSN_WRITE` handlers.
In the Linux kernel, the following vulnerability has been resolved:
fs/buffer: fix use-after-free when call bh_read() helper
There's issue as follows:
BUG: KASAN: stack-out-of-bounds in end_buffer_read_sync+0xe3/0x110
Read of size 8 at addr ffffc9000168f7f8 by task swapper/3/0
CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.16.0-862.14.0.6.x86_64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_address_description.constprop.0+0x2c/0x390
print_report+0xb4/0x270
kasan_report+0xb8/0xf0
end_buffer_read_sync+0xe3/0x110
end_bio_bh_io_sync+0x56/0x80
blk_update_request+0x30a/0x720
scsi_end_request+0x51/0x2b0
scsi_io_completion+0xe3/0x480
? scsi_device_unbusy+0x11e/0x160
blk_complete_reqs+0x7b/0x90
handle_softirqs+0xef/0x370
irq_exit_rcu+0xa5/0xd0
sysvec_apic_timer_interrupt+0x6e/0x90
</IRQ>
Above issue happens when do ntfs3 filesystem mount, issue may happens
as follows:
mount IRQ
ntfs_fill_super
read_cache_page
do_read_cache_folio
filemap_read_folio
mpage_read_folio
do_mpage_readpage
ntfs_get_block_vbo
bh_read
submit_bh
wait_on_buffer(bh);
blk_complete_reqs
scsi_io_completion
scsi_end_request
blk_update_request
end_bio_bh_io_sync
end_buffer_read_sync
__end_buffer_read_notouch
unlock_buffer
wait_on_buffer(bh);--> return will return to caller
put_bh
--> trigger stack-out-of-bounds
In the mpage_read_folio() function, the stack variable 'map_bh' is
passed to ntfs_get_block_vbo(). Once unlock_buffer() unlocks and
wait_on_buffer() returns to continue processing, the stack variable
is likely to be reclaimed. Consequently, during the end_buffer_read_sync()
process, calling put_bh() may result in stack overrun.
If the bh is not allocated on the stack, it belongs to a folio. Freeing
a buffer head which belongs to a folio is done by drop_buffers() which
will fail to free buffers which are still locked. So it is safe to call
put_bh() before __end_buffer_read_notouch().
In the Linux kernel, the following vulnerability has been resolved:
netfilter: ctnetlink: fix refcount leak on table dump
There is a reference count leak in ctnetlink_dump_table():
if (res < 0) {
nf_conntrack_get(&ct->ct_general); // HERE
cb->args[1] = (unsigned long)ct;
...
While its very unlikely, its possible that ct == last.
If this happens, then the refcount of ct was already incremented.
This 2nd increment is never undone.
This prevents the conntrack object from being released, which in turn
keeps prevents cnet->count from dropping back to 0.
This will then block the netns dismantle (or conntrack rmmod) as
nf_conntrack_cleanup_net_list() will wait forever.
This can be reproduced by running conntrack_resize.sh selftest in a loop.
It takes ~20 minutes for me on a preemptible kernel on average before
I see a runaway kworker spinning in nf_conntrack_cleanup_net_list.
One fix would to change this to:
if (res < 0) {
if (ct != last)
nf_conntrack_get(&ct->ct_general);
But this reference counting isn't needed in the first place.
We can just store a cookie value instead.
A followup patch will do the same for ctnetlink_exp_dump_table,
it looks to me as if this has the same problem and like
ctnetlink_dump_table, we only need a 'skip hint', not the actual
object so we can apply the same cookie strategy there as well.
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Validate i_depth for exhash directories
A fuzzer test introduced corruption that ends up with a depth of 0 in
dir_e_read(), causing an undefined shift by 32 at:
index = hash >> (32 - dip->i_depth);
As calculated in an open-coded way in dir_make_exhash(), the minimum
depth for an exhash directory is ilog2(sdp->sd_hash_ptrs) and 0 is
invalid as sdp->sd_hash_ptrs is fixed as sdp->bsize / 16 at mount time.
So we can avoid the undefined behaviour by checking for depth values
lower than the minimum in gfs2_dinode_in(). Values greater than the
maximum are already being checked for there.
Also switch the calculation in dir_make_exhash() to use ilog2() to
clarify how the depth is calculated.
Tested with the syzkaller repro.c and xfstests '-g quick'.