In the Linux kernel, the following vulnerability has been resolved:
atm: fore200e: fix use-after-free in tasklets during device removal
When the PCA-200E or SBA-200E adapter is being detached, the fore200e
is deallocated. However, the tx_tasklet or rx_tasklet may still be running
or pending, leading to use-after-free bug when the already freed fore200e
is accessed again in fore200e_tx_tasklet() or fore200e_rx_tasklet().
One of the race conditions can occur as follows:
CPU 0 (cleanup) | CPU 1 (tasklet)
fore200e_pca_remove_one() | fore200e_interrupt()
fore200e_shutdown() | tasklet_schedule()
kfree(fore200e) | fore200e_tx_tasklet()
| fore200e-> // UAF
Fix this by ensuring tx_tasklet or rx_tasklet is properly canceled before
the fore200e is released. Add tasklet_kill() in fore200e_shutdown() to
synchronize with any pending or running tasklets. Moreover, since
fore200e_reset() could prevent further interrupts or data transfers,
the tasklet_kill() should be placed after fore200e_reset() to prevent
the tasklet from being rescheduled in fore200e_interrupt(). Finally,
it only needs to do tasklet_kill() when the fore200e state is greater
than or equal to FORE200E_STATE_IRQ, since tasklets are uninitialized
in earlier states. In a word, the tasklet_kill() should be placed in
the FORE200E_STATE_IRQ branch within the switch...case structure.
This bug was identified through static analysis.
In the Linux kernel, the following vulnerability has been resolved:
net: consume xmit errors of GSO frames
udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests
currently in NIPA. They fail in the same exact way, TCP GRO
test stalls occasionally and the test gets killed after 10min.
These tests use veth to simulate GRO. They attach a trivial
("return XDP_PASS;") XDP program to the veth to force TSO off
and NAPI on.
Digging into the failure mode we can see that the connection
is completely stuck after a burst of drops. The sender's snd_nxt
is at sequence number N [1], but the receiver claims to have
received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle
is that senders rtx queue is not empty (let's say the block in
the rtx queue is at sequence number N - 4 * MSS [3]).
In this state, sender sends a retransmission from the rtx queue
with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3].
Receiver sees it and responds with an ACK all the way up to
N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA
because it has no recollection of ever sending data that far out [1].
And we are stuck.
The root cause is the mess of the xmit return codes. veth returns
an error when it can't xmit a frame. We end up with a loss event
like this:
-------------------------------------------------
| GSO super frame 1 | GSO super frame 2 |
|-----------------------------------------------|
| seg | seg | seg | seg | seg | seg | seg | seg |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
-------------------------------------------------
x ok ok <ok>| ok ok ok <x>
\\
snd_nxt
"x" means packet lost by veth, and "ok" means it went thru.
Since veth has TSO disabled in this test it sees individual segments.
Segment 1 is on the retransmit queue and will be resent.
So why did the sender not advance snd_nxt even tho it clearly did
send up to seg 8? tcp_write_xmit() interprets the return code
from the core to mean that data has not been sent at all. Since
TCP deals with GSO super frames, not individual segment the crux
of the problem is that loss of a single segment can be interpreted
as loss of all. TCP only sees the last return code for the last
segment of the GSO frame (in <> brackets in the diagram above).
Of course for the problem to occur we need a setup or a device
without a Qdisc. Otherwise Qdisc layer disconnects the protocol
layer from the device errors completely.
We have multiple ways to fix this.
1) make veth not return an error when it lost a packet.
While this is what I think we did in the past, the issue keeps
reappearing and it's annoying to debug. The game of whack
a mole is not great.
2) fix the damn return codes
We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the
documentation, so maybe we should make the return code from
ndo_start_xmit() a boolean. I like that the most, but perhaps
some ancient, not-really-networking protocol would suffer.
3) make TCP ignore the errors
It is not entirely clear to me what benefit TCP gets from
interpreting the result of ip_queue_xmit()? Specifically once
the connection is established and we're pushing data - packet
loss is just packet loss?
4) this fix
Ignore the rc in the Qdisc-less+GSO case, since it's unreliable.
We already always return OK in the TCQ_F_CAN_BYPASS case.
In the Qdisc-less case let's be a bit more conservative and only
mask the GSO errors. This path is taken by non-IP-"networks"
like CAN, MCTP etc, so we could regress some ancient thing.
This is the simplest, but also maybe the hackiest fix?
Similar fix has been proposed by Eric in the past but never committed
because original reporter was working with an OOT driver and wasn't
providing feedback (see Link).
In the Linux kernel, the following vulnerability has been resolved:
tcp: fix potential race in tcp_v6_syn_recv_sock()
Code in tcp_v6_syn_recv_sock() after the call to tcp_v4_syn_recv_sock()
is done too late.
After tcp_v4_syn_recv_sock(), the child socket is already visible
from TCP ehash table and other cpus might use it.
Since newinet->pinet6 is still pointing to the listener ipv6_pinfo
bad things can happen as syzbot found.
Move the problematic code in tcp_v6_mapped_child_init()
and call this new helper from tcp_v4_syn_recv_sock() before
the ehash insertion.
This allows the removal of one tcp_sync_mss(), since
tcp_v4_syn_recv_sock() will call it with the correct
context.
In the Linux kernel, the following vulnerability has been resolved:
xfs: delete attr leaf freemap entries when empty
Back in commit 2a2b5932db6758 ("xfs: fix attr leaf header freemap.size
underflow"), Brian Foster observed that it's possible for a small
freemap at the end of the end of the xattr entries array to experience
a size underflow when subtracting the space consumed by an expansion of
the entries array. There are only three freemap entries, which means
that it is not a complete index of all free space in the leaf block.
This code can leave behind a zero-length freemap entry with a nonzero
base. Subsequent setxattr operations can increase the base up to the
point that it overlaps with another freemap entry. This isn't in and of
itself a problem because the code in _leaf_add that finds free space
ignores any freemap entry with zero size.
However, there's another bug in the freemap update code in _leaf_add,
which is that it fails to update a freemap entry that begins midway
through the xattr entry that was just appended to the array. That can
result in the freemap containing two entries with the same base but
different sizes (0 for the "pushed-up" entry, nonzero for the entry
that's actually tracking free space). A subsequent _leaf_add can then
allocate xattr namevalue entries on top of the entries array, leading to
data loss. But fixing that is for later.
For now, eliminate the possibility of confusion by zeroing out the base
of any freemap entry that has zero size. Because the freemap is not
intended to be a complete index of free space, a subsequent failure to
find any free space for a new xattr will trigger block compaction, which
regenerates the freemap.
It looks like this bug has been in the codebase for quite a long time.
In the Linux kernel, the following vulnerability has been resolved:
netfilter: xt_tcpmss: check remaining length before reading optlen
Quoting reporter:
In net/netfilter/xt_tcpmss.c (lines 53-68), the TCP option parser reads
op[i+1] directly without validating the remaining option length.
If the last byte of the option field is not EOL/NOP (0/1), the code attempts
to index op[i+1]. In the case where i + 1 == optlen, this causes an
out-of-bounds read, accessing memory past the optlen boundary
(either reading beyond the stack buffer _opt or the
following payload).
In the Linux kernel, the following vulnerability has been resolved:
net: usb: kaweth: remove TX queue manipulation in kaweth_set_rx_mode
kaweth_set_rx_mode(), the ndo_set_rx_mode callback, calls
netif_stop_queue() and netif_wake_queue(). These are TX queue flow
control functions unrelated to RX multicast configuration.
The premature netif_wake_queue() can re-enable TX while tx_urb is still
in-flight, leading to a double usb_submit_urb() on the same URB:
kaweth_start_xmit() {
netif_stop_queue();
usb_submit_urb(kaweth->tx_urb);
}
kaweth_set_rx_mode() {
netif_stop_queue();
netif_wake_queue(); // wakes TX queue before URB is done
}
kaweth_start_xmit() {
netif_stop_queue();
usb_submit_urb(kaweth->tx_urb); // URB submitted while active
}
This triggers the WARN in usb_submit_urb():
"URB submitted while active"
This is a similar class of bug fixed in rtl8150 by
- commit 958baf5eaee3 ("net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast").
Also kaweth_set_rx_mode() is already functionally broken, the
real set_rx_mode action is performed by kaweth_async_set_rx_mode(),
which in turn is not a no-op only at ndo_open() time.
In the Linux kernel, the following vulnerability has been resolved:
media: ccs: Avoid possible division by zero
Calculating maximum M for scaler configuration involves dividing by
MIN_X_OUTPUT_SIZE limit register's value. Albeit the value is presumably
non-zero, the driver was missing the check it in fact was. Fix this.
In the Linux kernel, the following vulnerability has been resolved:
media: cx25821: Fix a resource leak in cx25821_dev_setup()
Add release_mem_region() if ioremap() fails to release the memory
region obtained by cx25821_get_resources().
In the Linux kernel, the following vulnerability has been resolved:
ocfs2: fix reflink preserve cleanup issue
commit c06c303832ec ("ocfs2: fix xattr array entry __counted_by error")
doesn't handle all cases and the cleanup job for preserved xattr entries
still has bug:
- the 'last' pointer should be shifted by one unit after cleanup
an array entry.
- current code logic doesn't cleanup the first entry when xh_count is 1.
Note, commit c06c303832ec is also a bug fix for 0fe9b66c65f3.
In the Linux kernel, the following vulnerability has been resolved:
md/bitmap: fix GPF in write_page caused by resize race
A General Protection Fault occurs in write_page() during array resize:
RIP: 0010:write_page+0x22b/0x3c0 [md_mod]
This is a use-after-free race between bitmap_daemon_work() and
__bitmap_resize(). The daemon iterates over `bitmap->storage.filemap`
without locking, while the resize path frees that storage via
md_bitmap_file_unmap(). `quiesce()` does not stop the md thread,
allowing concurrent access to freed pages.
Fix by holding `mddev->bitmap_info.mutex` during the bitmap update.