Security Vulnerabilities
- CVEs Published In March 2024
In the Linux kernel, the following vulnerability has been resolved:
ice: track AF_XDP ZC enabled queues in bitmap
Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure")
silently introduced a regression and broke the Tx side of AF_XDP in copy
mode. xsk_pool on ice_ring is set only based on the existence of the XDP
prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
That is not something that should happen for copy mode as it should use
the regular data path ice_clean_tx_irq.
This results in a following splat when xdpsock is run in txonly or l2fwd
scenarios in copy mode:
<snip>
[ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
[ 106.057269] #PF: supervisor read access in kernel mode
[ 106.062493] #PF: error_code(0x0000) - not-present page
[ 106.067709] PGD 0 P4D 0
[ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
[ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
[ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
[ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
[ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
[ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
[ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
[ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
[ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
[ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
[ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
[ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 106.192898] PKRU: 55555554
[ 106.195653] Call Trace:
[ 106.198143] <IRQ>
[ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
[ 106.205087] ice_napi_poll+0x3e/0x590 [ice]
[ 106.209356] __napi_poll+0x2a/0x160
[ 106.212911] net_rx_action+0xd6/0x200
[ 106.216634] __do_softirq+0xbf/0x29b
[ 106.220274] irq_exit_rcu+0x88/0xc0
[ 106.223819] common_interrupt+0x7b/0xa0
[ 106.227719] </IRQ>
[ 106.229857] asm_common_interrupt+0x1e/0x40
</snip>
Fix this by introducing the bitmap of queues that are zero-copy enabled,
where each bit, corresponding to a queue id that xsk pool is being
configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
checked within ice_xsk_pool(). The latter is a function used for
deciding which napi poll routine is executed.
Idea is being taken from our other drivers such as i40e and ixgbe.
In the Linux kernel, the following vulnerability has been resolved:
bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
added an implementation of the locked_down LSM hook to SELinux, with the aim
to restrict which domains are allowed to perform operations that would breach
lockdown. This is indirectly also getting audit subsystem involved to report
events. The latter is problematic, as reported by Ondrej and Serhei, since it
can bring down the whole system via audit:
1) The audit events that are triggered due to calls to security_locked_down()
can OOM kill a machine, see below details [0].
2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit()
when trying to wake up kauditd, for example, when using trace_sched_switch()
tracepoint, see details in [1]. Triggering this was not via some hypothetical
corner case, but with existing tools like runqlat & runqslower from bcc, for
example, which make use of this tracepoint. Rough call sequence goes like:
rq_lock(rq) -> -------------------------+
trace_sched_switch() -> |
bpf_prog_xyz() -> +-> deadlock
selinux_lockdown() -> |
audit_log_end() -> |
wake_up_interruptible() -> |
try_to_wake_up() -> |
rq_lock(rq) --------------+
What's worse is that the intention of 59438b46471a to further restrict lockdown
settings for specific applications in respect to the global lockdown policy is
completely broken for BPF. The SELinux policy rule for the current lockdown check
looks something like this:
allow <who> <who> : lockdown { <reason> };
However, this doesn't match with the 'current' task where the security_locked_down()
is executed, example: httpd does a syscall. There is a tracing program attached
to the syscall which triggers a BPF program to run, which ends up doing a
bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does
the permission check against 'current', that is, httpd in this example. httpd
has literally zero relation to this tracing program, and it would be nonsensical
having to write an SELinux policy rule against httpd to let the tracing helper
pass. The policy in this case needs to be against the entity that is installing
the BPF program. For example, if bpftrace would generate a histogram of syscall
counts by user space application:
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
bpftrace would then go and generate a BPF program from this internally. One way
of doing it [for the sake of the example] could be to call bpf_get_current_task()
helper and then access current->comm via one of bpf_probe_read_kernel{,_str}()
helpers. So the program itself has nothing to do with httpd or any other random
app doing a syscall here. The BPF program _explicitly initiated_ the lockdown
check. The allow/deny policy belongs in the context of bpftrace: meaning, you
want to grant bpftrace access to use these helpers, but other tracers on the
system like my_random_tracer _not_.
Therefore fix all three issues at the same time by taking a completely different
approach for the security_locked_down() hook, that is, move the check into the
program verification phase where we actually retrieve the BPF func proto. This
also reliably gets the task (current) that is trying to install the BPF tracing
program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since
we're moving this out of the BPF helper's fast-path which can be called several
millions of times per second.
The check is then also in line with other security_locked_down() hooks in the
system where the enforcement is performed at open/load time, for example,
open_kcore() for /proc/kcore access or module_sig_check() for module signatures
just to pick f
---truncated---
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nft_ct: skip expectations for confirmed conntrack
nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed
conntrack entry. However, nf_ct_ext_add() can only be called for
!nf_ct_is_confirmed().
[ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00
[ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202
[ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887
[ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440
[ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447
[ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440
[ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20
[ 1825.352240] FS: 00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000
[ 1825.352343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0
[ 1825.352508] Call Trace:
[ 1825.352544] nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack]
[ 1825.352641] nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct]
[ 1825.352716] nft_do_chain+0x232/0x850 [nf_tables]
Add the ct helper extension only for unconfirmed conntrack. Skip rule
evaluation if the ct helper extension does not exist. Thus, you can
only create expectations from the first packet.
It should be possible to remove this limitation by adding a new action
to attach a generic ct helper to the first packet. Then, use this ct
helper extension from follow up packets to create the ct expectation.
While at it, add a missing check to skip the template conntrack too
and remove check for IPCT_UNTRACK which is implicit to !ct.
In the Linux kernel, the following vulnerability has been resolved:
nvmet: fix freeing unallocated p2pmem
In case p2p device was found but the p2p pool is empty, the nvme target
is still trying to free the sgl from the p2p pool instead of the
regular sgl pool and causing a crash (BUG() is called). Instead, assign
the p2p_dev for the request only if it was allocated from p2p pool.
This is the crash that was caused:
[Sun May 30 19:13:53 2021] ------------[ cut here ]------------
[Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518!
[Sun May 30 19:13:53 2021] invalid opcode: 0000 [#1] SMP PTI
...
[Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518!
...
[Sun May 30 19:13:53 2021] RIP: 0010:gen_pool_free_owner+0xa8/0xb0
...
[Sun May 30 19:13:53 2021] Call Trace:
[Sun May 30 19:13:53 2021] ------------[ cut here ]------------
[Sun May 30 19:13:53 2021] pci_free_p2pmem+0x2b/0x70
[Sun May 30 19:13:53 2021] pci_p2pmem_free_sgl+0x4f/0x80
[Sun May 30 19:13:53 2021] nvmet_req_free_sgls+0x1e/0x80 [nvmet]
[Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518!
[Sun May 30 19:13:53 2021] nvmet_rdma_release_rsp+0x4e/0x1f0 [nvmet_rdma]
[Sun May 30 19:13:53 2021] nvmet_rdma_send_done+0x1c/0x60 [nvmet_rdma]
In the Linux kernel, the following vulnerability has been resolved:
net/tls: Fix use-after-free after the TLS device goes down and up
When a netdev with active TLS offload goes down, tls_device_down is
called to stop the offload and tear down the TLS context. However, the
socket stays alive, and it still points to the TLS context, which is now
deallocated. If a netdev goes up, while the connection is still active,
and the data flow resumes after a number of TCP retransmissions, it will
lead to a use-after-free of the TLS context.
This commit addresses this bug by keeping the context alive until its
normal destruction, and implements the necessary fallbacks, so that the
connection can resume in software (non-offloaded) kTLS mode.
On the TX side tls_sw_fallback is used to encrypt all packets. The RX
side already has all the necessary fallbacks, because receiving
non-decrypted packets is supported. The thing needed on the RX side is
to block resync requests, which are normally produced after receiving
non-decrypted packets.
The necessary synchronization is implemented for a graceful teardown:
first the fallbacks are deployed, then the driver resources are released
(it used to be possible to have a tls_dev_resync after tls_dev_del).
A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
mode. It's used to skip the RX resync logic completely, as it becomes
useless, and some objects may be released (for example, resync_async,
which is allocated and freed by the driver).
In the Linux kernel, the following vulnerability has been resolved:
mptcp: fix sk_forward_memory corruption on retransmission
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.
Currently we have a code path updating such field without handling
the relevant lock:
__mptcp_retrans() -> __mptcp_clean_una_wakeup()
Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.
Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.
In the Linux kernel, the following vulnerability has been resolved:
HID: amd_sfh: Fix memory leak in amd_sfh_work
Kmemleak tool detected a memory leak in the amd_sfh driver.
====================
unreferenced object 0xffff88810228ada0 (size 32):
comm "insmod", pid 3968, jiffies 4295056001 (age 775.792s)
hex dump (first 32 bytes):
00 20 73 1f 81 88 ff ff 00 01 00 00 00 00 ad de . s.............
22 01 00 00 00 00 ad de 01 00 02 00 00 00 00 00 "...............
backtrace:
[<000000007b4c8799>] kmem_cache_alloc_trace+0x163/0x4f0
[<0000000005326893>] amd_sfh_get_report+0xa4/0x1d0 [amd_sfh]
[<000000002a9e5ec4>] amdtp_hid_request+0x62/0x80 [amd_sfh]
[<00000000b8a95807>] sensor_hub_get_feature+0x145/0x270 [hid_sensor_hub]
[<00000000fda054ee>] hid_sensor_parse_common_attributes+0x215/0x460 [hid_sensor_iio_common]
[<0000000021279ecf>] hid_accel_3d_probe+0xff/0x4a0 [hid_sensor_accel_3d]
[<00000000915760ce>] platform_probe+0x6a/0xd0
[<0000000060258a1f>] really_probe+0x192/0x620
[<00000000fa812f2d>] driver_probe_device+0x14a/0x1d0
[<000000005e79f7fd>] __device_attach_driver+0xbd/0x110
[<0000000070d15018>] bus_for_each_drv+0xfd/0x160
[<0000000013a3c312>] __device_attach+0x18b/0x220
[<000000008c7b4afc>] device_initial_probe+0x13/0x20
[<00000000e6e99665>] bus_probe_device+0xfe/0x120
[<00000000833fa90b>] device_add+0x6a6/0xe00
[<00000000fa901078>] platform_device_add+0x180/0x380
====================
The fix is to freeing request_list entry once the processed entry is
removed from the request_list.
In the Linux kernel, the following vulnerability has been resolved:
efi/fdt: fix panic when no valid fdt found
setup_arch() would invoke efi_init()->efi_get_fdt_params(). If no
valid fdt found then initial_boot_params will be null. So we
should stop further fdt processing here. I encountered this
issue on risc-v.
In the Linux kernel, the following vulnerability has been resolved:
mt76: mt7921: fix possible AOOB issue in mt7921_mcu_tx_rate_report
Fix possible array out of bound access in mt7921_mcu_tx_rate_report.
Remove unnecessary varibable in mt7921_mcu_tx_rate_report
In the Linux kernel, the following vulnerability has been resolved:
neighbour: allow NUD_NOARP entries to be forced GCed
IFF_POINTOPOINT interfaces use NUD_NOARP entries for IPv6. It's possible to
fill up the neighbour table with enough entries that it will overflow for
valid connections after that.
This behaviour is more prevalent after commit 58956317c8de ("neighbor:
Improve garbage collection") is applied, as it prevents removal from
entries that are not NUD_FAILED, unless they are more than 5s old.