In the Linux kernel, the following vulnerability has been resolved:
media: pvrusb2: fix URB leak in pvr2_send_request_ex
When pvr2_send_request_ex() submits a write URB successfully but fails to
submit the read URB (e.g. returns -ENOMEM), it returns immediately without
waiting for the write URB to complete. Since the driver reuses the same
URB structure, a subsequent call to pvr2_send_request_ex() attempts to
submit the still-active write URB, triggering a 'URB submitted while
active' warning in usb_submit_urb().
Fix this by ensuring the write URB is unlinked and waited upon if the read
URB submission fails.
In the Linux kernel, the following vulnerability has been resolved:
net/rds: No shortcut out of RDS_CONN_ERROR
RDS connections carry a state "rds_conn_path::cp_state"
and transitions from one state to another and are conditional
upon an expected state: "rds_conn_path_transition."
There is one exception to this conditionality, which is
"RDS_CONN_ERROR" that can be enforced by "rds_conn_path_drop"
regardless of what state the condition is currently in.
But as soon as a connection enters state "RDS_CONN_ERROR",
the connection handling code expects it to go through the
shutdown-path.
The RDS/TCP multipath changes added a shortcut out of
"RDS_CONN_ERROR" straight back to "RDS_CONN_CONNECTING"
via "rds_tcp_accept_one_path" (e.g. after "rds_tcp_state_change").
A subsequent "rds_tcp_reset_callbacks" can then transition
the state to "RDS_CONN_RESETTING" with a shutdown-worker queued.
That'll trip up "rds_conn_init_shutdown", which was
never adjusted to handle "RDS_CONN_RESETTING" and subsequently
drops the connection with the dreaded "DR_INV_CONN_STATE",
which leaves "RDS_SHUTDOWN_WORK_QUEUED" on forever.
So we do two things here:
a) Don't shortcut "RDS_CONN_ERROR", but take the longer
path through the shutdown code.
b) Add "RDS_CONN_RESETTING" to the expected states in
"rds_conn_init_shutdown" so that we won't error out
and get stuck, if we ever hit weird state transitions
like this again."
In the Linux kernel, the following vulnerability has been resolved:
clocksource/drivers/sh_tmu: Always leave device running after probe
The TMU device can be used as both a clocksource and a clockevent
provider. The driver tries to be smart and power itself on and off, as
well as enabling and disabling its clock when it's not in operation.
This behavior is slightly altered if the TMU is used as an early
platform device in which case the device is left powered on after probe,
but the clock is still enabled and disabled at runtime.
This has worked for a long time, but recent improvements in PREEMPT_RT
and PROVE_LOCKING have highlighted an issue. As the TMU registers itself
as a clockevent provider, clockevents_register_device(), it needs to use
raw spinlocks internally as this is the context of which the clockevent
framework interacts with the TMU driver. However in the context of
holding a raw spinlock the TMU driver can't really manage its power
state or clock with calls to pm_runtime_*() and clk_*() as these calls
end up in other platform drivers using regular spinlocks to control
power and clocks.
This mix of spinlock contexts trips a lockdep warning.
=============================
[ BUG: Invalid wait context ]
6.18.0-arm64-renesas-09926-gee959e7c5e34 #1 Not tainted
-----------------------------
swapper/0/0 is trying to lock:
ffff000008c9e180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88
other info that might help us debug this:
context-{5:5}
1 lock held by swapper/0/0:
ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0
#0: ffff8000817ec298
ccree e6601000.crypto: ARM ccree device initialized
(tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8
stack backtrace:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-arm64-renesas-09926-gee959e7c5e34 #1 PREEMPT
Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
Call trace:
show_stack+0x14/0x1c (C)
dump_stack_lvl+0x6c/0x90
dump_stack+0x14/0x1c
__lock_acquire+0x904/0x1584
lock_acquire+0x220/0x34c
_raw_spin_lock_irqsave+0x58/0x80
__pm_runtime_resume+0x38/0x88
sh_tmu_clock_event_set_oneshot+0x84/0xd4
clockevents_switch_state+0xfc/0x13c
tick_broadcast_set_event+0x30/0xa4
__tick_broadcast_oneshot_control+0x1e0/0x3a8
tick_broadcast_oneshot_control+0x30/0x40
cpuidle_enter_state+0x40c/0x680
cpuidle_enter+0x30/0x40
do_idle+0x1f4/0x280
cpu_startup_entry+0x34/0x40
kernel_init+0x0/0x130
do_one_initcall+0x0/0x230
__primary_switched+0x88/0x90
For non-PREEMPT_RT builds this is not really an issue, but for
PREEMPT_RT builds where normal spinlocks can sleep this might be an
issue. Be cautious and always leave the power and clock running after
probe.
In the Linux kernel, the following vulnerability has been resolved:
net/rds: Clear reconnect pending bit
When canceling the reconnect worker, care must be taken to reset the
reconnect-pending bit. If the reconnect worker has not yet been
scheduled before it is canceled, the reconnect-pending bit will stay
on forever.
In the Linux kernel, the following vulnerability has been resolved:
net: Drop the lock in skb_may_tx_timestamp()
skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
not be taken in IRQ context, only softirq is okay. A few drivers receive
the timestamp via a dedicated interrupt and complete the TX timestamp
from that handler. This will lead to a deadlock if the lock is already
write-locked on the same CPU.
Taking the lock can be avoided. The socket (pointed by the skb) will
remain valid until the skb is released. The ->sk_socket and ->file
member will be set to NULL once the user closes the socket which may
happen before the timestamp arrives.
If we happen to observe the pointer while the socket is closing but
before the pointer is set to NULL then we may use it because both
pointer (and the file's cred member) are RCU freed.
Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a
matching WRITE_ONCE() where the pointer are cleared.
In the Linux kernel, the following vulnerability has been resolved:
media: i2c/tw9903: Fix potential memory leak in tw9903_probe()
In one of the error paths in tw9903_probe(), the memory allocated in
v4l2_ctrl_handler_init() and v4l2_ctrl_new_std() is not freed. Fix that
by calling v4l2_ctrl_handler_free() on the handler in that error path.
In the Linux kernel, the following vulnerability has been resolved:
minix: Add required sanity checking to minix_check_superblock()
The fs/minix implementation of the minix filesystem does not currently
support any other value for s_log_zone_size than 0. This is also the
only value supported in util-linux; see mkfs.minix.c line 511. In
addition, this patch adds some sanity checking for the other minix
superblock fields, and moves the minix_blocks_needed() checks for the
zmap and imap also to minix_check_super_block().
This also closes a related syzbot bug report.
In the Linux kernel, the following vulnerability has been resolved:
fbdev: vt8500lcdfb: fix missing dma_free_coherent()
fbi->fb.screen_buffer is allocated with dma_alloc_coherent() but is not
freed if the error path is reached.
In the Linux kernel, the following vulnerability has been resolved:
atm: fore200e: fix use-after-free in tasklets during device removal
When the PCA-200E or SBA-200E adapter is being detached, the fore200e
is deallocated. However, the tx_tasklet or rx_tasklet may still be running
or pending, leading to use-after-free bug when the already freed fore200e
is accessed again in fore200e_tx_tasklet() or fore200e_rx_tasklet().
One of the race conditions can occur as follows:
CPU 0 (cleanup) | CPU 1 (tasklet)
fore200e_pca_remove_one() | fore200e_interrupt()
fore200e_shutdown() | tasklet_schedule()
kfree(fore200e) | fore200e_tx_tasklet()
| fore200e-> // UAF
Fix this by ensuring tx_tasklet or rx_tasklet is properly canceled before
the fore200e is released. Add tasklet_kill() in fore200e_shutdown() to
synchronize with any pending or running tasklets. Moreover, since
fore200e_reset() could prevent further interrupts or data transfers,
the tasklet_kill() should be placed after fore200e_reset() to prevent
the tasklet from being rescheduled in fore200e_interrupt(). Finally,
it only needs to do tasklet_kill() when the fore200e state is greater
than or equal to FORE200E_STATE_IRQ, since tasklets are uninitialized
in earlier states. In a word, the tasklet_kill() should be placed in
the FORE200E_STATE_IRQ branch within the switch...case structure.
This bug was identified through static analysis.
In the Linux kernel, the following vulnerability has been resolved:
net: consume xmit errors of GSO frames
udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests
currently in NIPA. They fail in the same exact way, TCP GRO
test stalls occasionally and the test gets killed after 10min.
These tests use veth to simulate GRO. They attach a trivial
("return XDP_PASS;") XDP program to the veth to force TSO off
and NAPI on.
Digging into the failure mode we can see that the connection
is completely stuck after a burst of drops. The sender's snd_nxt
is at sequence number N [1], but the receiver claims to have
received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle
is that senders rtx queue is not empty (let's say the block in
the rtx queue is at sequence number N - 4 * MSS [3]).
In this state, sender sends a retransmission from the rtx queue
with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3].
Receiver sees it and responds with an ACK all the way up to
N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA
because it has no recollection of ever sending data that far out [1].
And we are stuck.
The root cause is the mess of the xmit return codes. veth returns
an error when it can't xmit a frame. We end up with a loss event
like this:
-------------------------------------------------
| GSO super frame 1 | GSO super frame 2 |
|-----------------------------------------------|
| seg | seg | seg | seg | seg | seg | seg | seg |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
-------------------------------------------------
x ok ok <ok>| ok ok ok <x>
\\
snd_nxt
"x" means packet lost by veth, and "ok" means it went thru.
Since veth has TSO disabled in this test it sees individual segments.
Segment 1 is on the retransmit queue and will be resent.
So why did the sender not advance snd_nxt even tho it clearly did
send up to seg 8? tcp_write_xmit() interprets the return code
from the core to mean that data has not been sent at all. Since
TCP deals with GSO super frames, not individual segment the crux
of the problem is that loss of a single segment can be interpreted
as loss of all. TCP only sees the last return code for the last
segment of the GSO frame (in <> brackets in the diagram above).
Of course for the problem to occur we need a setup or a device
without a Qdisc. Otherwise Qdisc layer disconnects the protocol
layer from the device errors completely.
We have multiple ways to fix this.
1) make veth not return an error when it lost a packet.
While this is what I think we did in the past, the issue keeps
reappearing and it's annoying to debug. The game of whack
a mole is not great.
2) fix the damn return codes
We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the
documentation, so maybe we should make the return code from
ndo_start_xmit() a boolean. I like that the most, but perhaps
some ancient, not-really-networking protocol would suffer.
3) make TCP ignore the errors
It is not entirely clear to me what benefit TCP gets from
interpreting the result of ip_queue_xmit()? Specifically once
the connection is established and we're pushing data - packet
loss is just packet loss?
4) this fix
Ignore the rc in the Qdisc-less+GSO case, since it's unreliable.
We already always return OK in the TCQ_F_CAN_BYPASS case.
In the Qdisc-less case let's be a bit more conservative and only
mask the GSO errors. This path is taken by non-IP-"networks"
like CAN, MCTP etc, so we could regress some ancient thing.
This is the simplest, but also maybe the hackiest fix?
Similar fix has been proposed by Eric in the past but never committed
because original reporter was working with an OOT driver and wasn't
providing feedback (see Link).