In the Linux kernel, the following vulnerability has been resolved:
rcu: Fix rcu_read_unlock() deadloop due to IRQ work
During rcu_read_unlock_special(), if this happens during irq_exit(), we
can lockup if an IPI is issued. This is because the IPI itself triggers
the irq_exit() path causing a recursive lock up.
This is precisely what Xiongfeng found when invoking a BPF program on
the trace_tick_stop() tracepoint As shown in the trace below. Fix by
managing the irq_work state correctly.
irq_exit()
__irq_exit_rcu()
/* in_hardirq() returns false after this */
preempt_count_sub(HARDIRQ_OFFSET)
tick_irq_exit()
tick_nohz_irq_exit()
tick_nohz_stop_sched_tick()
trace_tick_stop() /* a bpf prog is hooked on this trace point */
__bpf_trace_tick_stop()
bpf_trace_run2()
rcu_read_unlock_special()
/* will send a IPI to itself */
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
A simple reproducer can also be obtained by doing the following in
tick_irq_exit(). It will hang on boot without the patch:
static inline void tick_irq_exit(void)
{
+ rcu_read_lock();
+ WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
+ rcu_read_unlock();
+
[neeraj: Apply Frederic's suggested fix for PREEMPT_RT]
In the Linux kernel, the following vulnerability has been resolved:
rcutorture: Fix rcutorture_one_extend_check() splat in RT kernels
For built with CONFIG_PREEMPT_RT=y kernels, running rcutorture
tests resulted in the following splat:
[ 68.797425] rcutorture_one_extend_check during change: Current 0x1 To add 0x1 To remove 0x0 preempt_count() 0x0
[ 68.797533] WARNING: CPU: 2 PID: 512 at kernel/rcu/rcutorture.c:1993 rcutorture_one_extend_check+0x419/0x560 [rcutorture]
[ 68.797601] Call Trace:
[ 68.797602] <TASK>
[ 68.797619] ? lockdep_softirqs_off+0xa5/0x160
[ 68.797631] rcutorture_one_extend+0x18e/0xcc0 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797646] ? local_clock+0x19/0x40
[ 68.797659] rcu_torture_one_read+0xf0/0x280 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797678] ? __pfx_rcu_torture_one_read+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797804] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797815] rcu-torture: rcu_torture_reader task started
[ 68.797824] rcu-torture: Creating rcu_torture_reader task
[ 68.797824] rcu_torture_reader+0x238/0x580 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797836] ? kvm_sched_clock_read+0x15/0x30
Disable BH does not change the SOFTIRQ corresponding bits in
preempt_count() for RT kernels, this commit therefore use
softirq_count() to check the if BH is disabled.
In the Linux kernel, the following vulnerability has been resolved:
wifi: ath10k: shutdown driver when hardware is unreliable
In rare cases, ath10k may lose connection with the PCIe bus due to
some unknown reasons, which could further lead to system crashes during
resuming due to watchdog timeout:
ath10k_pci 0000:01:00.0: wmi command 20486 timeout, restarting hardware
ath10k_pci 0000:01:00.0: already restarting
ath10k_pci 0000:01:00.0: failed to stop WMI vdev 0: -11
ath10k_pci 0000:01:00.0: failed to stop vdev 0: -11
ieee80211 phy0: PM: **** DPM device timeout ****
Call Trace:
panic+0x125/0x315
dpm_watchdog_set+0x54/0x54
dpm_watchdog_handler+0x57/0x57
call_timer_fn+0x31/0x13c
At this point, all WMI commands will timeout and attempt to restart
device. So set a threshold for consecutive restart failures. If the
threshold is exceeded, consider the hardware is unreliable and all
ath10k operations should be skipped to avoid system crash.
fail_cont_count and pending_recovery are atomic variables, and
do not involve complex conditional logic. Therefore, even if recovery
check and reconfig complete are executed concurrently, the recovery
mechanism will not be broken.
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00288-QCARMSWPZ-1
In the Linux kernel, the following vulnerability has been resolved:
RDMA: hfi1: fix possible divide-by-zero in find_hw_thread_mask()
The function divides number of online CPUs by num_core_siblings, and
later checks the divider by zero. This implies a possibility to get
and divide-by-zero runtime error. Fix it by moving the check prior to
division. This also helps to save one indentation level.
In the Linux kernel, the following vulnerability has been resolved:
NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
The function needs to check the minimal filehandle length before it can
access the embedded filehandle.
In the Linux kernel, the following vulnerability has been resolved:
serial: 8250: fix panic due to PSLVERR
When the PSLVERR_RESP_EN parameter is set to 1, the device generates
an error response if an attempt is made to read an empty RBR (Receive
Buffer Register) while the FIFO is enabled.
In serial8250_do_startup(), calling serial_port_out(port, UART_LCR,
UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes
dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter
function enables the FIFO via serial_out(p, UART_FCR, p->fcr).
Execution proceeds to the serial_port_in(port, UART_RX).
This satisfies the PSLVERR trigger condition.
When another CPU (e.g., using printk()) is accessing the UART (UART
is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) ==
(lcr & ~UART_LCR_SPAR) in dw8250_check_lcr(), causing it to enter
dw8250_force_idle().
Put serial_port_out(port, UART_LCR, UART_LCR_WLEN8) under the port->lock
to fix this issue.
Panic backtrace:
[ 0.442336] Oops - unknown exception [#1]
[ 0.442343] epc : dw8250_serial_in32+0x1e/0x4a
[ 0.442351] ra : serial8250_do_startup+0x2c8/0x88e
...
[ 0.442416] console_on_rootfs+0x26/0x70
In the Linux kernel, the following vulnerability has been resolved:
parisc: Revise __get_user() to probe user read access
Because of the way read access support is implemented, read access
interruptions are only triggered at privilege levels 2 and 3. The
kernel executes at privilege level 0, so __get_user() never triggers
a read access interruption (code 26). Thus, it is currently possible
for user code to access a read protected address via a system call.
Fix this by probing read access rights at privilege level 3 (PRIV_USER)
and setting __gu_err to -EFAULT (-14) if access isn't allowed.
Note the cmpiclr instruction does a 32-bit compare because COND macro
doesn't work inside asm.
In the Linux kernel, the following vulnerability has been resolved:
media: venus: protect against spurious interrupts during probe
Make sure the interrupt handler is initialized before the interrupt is
registered.
If the IRQ is registered before hfi_create(), it's possible that an
interrupt fires before the handler setup is complete, leading to a NULL
dereference.
This error condition has been observed during system boot on Rb3Gen2.
In the Linux kernel, the following vulnerability has been resolved:
media: venus: Add a check for packet size after reading from shared memory
Add a check to ensure that the packet size does not exceed the number of
available words after reading the packet header from shared memory. This
ensures that the size provided by the firmware is safe to process and
prevent potential out-of-bounds memory access.
In the Linux kernel, the following vulnerability has been resolved:
media: rainshadow-cec: fix TOCTOU race condition in rain_interrupt()
In the interrupt handler rain_interrupt(), the buffer full check on
rain->buf_len is performed before acquiring rain->buf_lock. This
creates a Time-of-Check to Time-of-Use (TOCTOU) race condition, as
rain->buf_len is concurrently accessed and modified in the work
handler rain_irq_work_handler() under the same lock.
Multiple interrupt invocations can race, with each reading buf_len
before it becomes full and then proceeding. This can lead to both
interrupts attempting to write to the buffer, incrementing buf_len
beyond its capacity (DATA_SIZE) and causing a buffer overflow.
Fix this bug by moving the spin_lock() to before the buffer full
check. This ensures that the check and the subsequent buffer modification
are performed atomically, preventing the race condition. An corresponding
spin_unlock() is added to the overflow path to correctly release the
lock.
This possible bug was found by an experimental static analysis tool
developed by our team.