In the Linux kernel, the following vulnerability has been resolved:
net: add xmit recursion limit to tunnel xmit functions
Tunnel xmit functions (iptunnel_xmit, ip6tunnel_xmit) lack their own
recursion limit. When a bond device in broadcast mode has GRE tap
interfaces as slaves, and those GRE tunnels route back through the
bond, multicast/broadcast traffic triggers infinite recursion between
bond_xmit_broadcast() and ip_tunnel_xmit()/ip6_tnl_xmit(), causing
kernel stack overflow.
The existing XMIT_RECURSION_LIMIT (8) in the no-qdisc path is not
sufficient because tunnel recursion involves route lookups and full IP
output, consuming much more stack per level. Use a lower limit of 4
(IP_TUNNEL_RECURSION_LIMIT) to prevent overflow.
Add recursion detection using dev_xmit_recursion helpers directly in
iptunnel_xmit() and ip6tunnel_xmit() to cover all IPv4/IPv6 tunnel
paths including UDP encapsulated tunnels (VXLAN, Geneve, etc.).
Move dev_xmit_recursion helpers from net/core/dev.h to public header
include/linux/netdevice.h so they can be used by tunnel code.
BUG: KASAN: stack-out-of-bounds in blake2s.constprop.0+0xe7/0x160
Write of size 32 at addr ffff88810033fed0 by task kworker/0:1/11
Workqueue: mld mld_ifc_work
Call Trace:
<TASK>
__build_flow_key.constprop.0 (net/ipv4/route.c:515)
ip_rt_update_pmtu (net/ipv4/route.c:1073)
iptunnel_xmit (net/ipv4/ip_tunnel_core.c:84)
ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847)
gre_tap_xmit (net/ipv4/ip_gre.c:779)
dev_hard_start_xmit (net/core/dev.c:3887)
sch_direct_xmit (net/sched/sch_generic.c:347)
__dev_queue_xmit (net/core/dev.c:4802)
bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
dev_hard_start_xmit (net/core/dev.c:3887)
__dev_queue_xmit (net/core/dev.c:4841)
ip_finish_output2 (net/ipv4/ip_output.c:237)
ip_output (net/ipv4/ip_output.c:438)
iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86)
gre_tap_xmit (net/ipv4/ip_gre.c:779)
dev_hard_start_xmit (net/core/dev.c:3887)
sch_direct_xmit (net/sched/sch_generic.c:347)
__dev_queue_xmit (net/core/dev.c:4802)
bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
dev_hard_start_xmit (net/core/dev.c:3887)
__dev_queue_xmit (net/core/dev.c:4841)
ip_finish_output2 (net/ipv4/ip_output.c:237)
ip_output (net/ipv4/ip_output.c:438)
iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86)
ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847)
gre_tap_xmit (net/ipv4/ip_gre.c:779)
dev_hard_start_xmit (net/core/dev.c:3887)
sch_direct_xmit (net/sched/sch_generic.c:347)
__dev_queue_xmit (net/core/dev.c:4802)
bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
dev_hard_start_xmit (net/core/dev.c:3887)
__dev_queue_xmit (net/core/dev.c:4841)
mld_sendpack
mld_ifc_work
process_one_work
worker_thread
</TASK>
In the Linux kernel, the following vulnerability has been resolved:
net/sched: teql: fix NULL pointer dereference in iptunnel_xmit on TEQL slave xmit
teql_master_xmit() calls netdev_start_xmit(skb, slave) to transmit
through slave devices, but does not update skb->dev to the slave device
beforehand.
When a gretap tunnel is a TEQL slave, the transmit path reaches
iptunnel_xmit() which saves dev = skb->dev (still pointing to teql0
master) and later calls iptunnel_xmit_stats(dev, pkt_len). This
function does:
get_cpu_ptr(dev->tstats)
Since teql_master_setup() does not set dev->pcpu_stat_type to
NETDEV_PCPU_STAT_TSTATS, the core network stack never allocates tstats
for teql0, so dev->tstats is NULL. get_cpu_ptr(NULL) computes
NULL + __per_cpu_offset[cpu], resulting in a page fault.
BUG: unable to handle page fault for address: ffff8880e6659018
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 68bc067 P4D 68bc067 PUD 0
Oops: Oops: 0002 [#1] SMP KASAN PTI
RIP: 0010:iptunnel_xmit (./include/net/ip_tunnels.h:664 net/ipv4/ip_tunnel_core.c:89)
Call Trace:
<TASK>
ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847)
__gre_xmit (net/ipv4/ip_gre.c:478)
gre_tap_xmit (net/ipv4/ip_gre.c:779)
teql_master_xmit (net/sched/sch_teql.c:319)
dev_hard_start_xmit (net/core/dev.c:3887)
sch_direct_xmit (net/sched/sch_generic.c:347)
__dev_queue_xmit (net/core/dev.c:4802)
neigh_direct_output (net/core/neighbour.c:1660)
ip_finish_output2 (net/ipv4/ip_output.c:237)
__ip_finish_output.part.0 (net/ipv4/ip_output.c:315)
ip_mc_output (net/ipv4/ip_output.c:369)
ip_send_skb (net/ipv4/ip_output.c:1508)
udp_send_skb (net/ipv4/udp.c:1195)
udp_sendmsg (net/ipv4/udp.c:1485)
inet_sendmsg (net/ipv4/af_inet.c:859)
__sys_sendto (net/socket.c:2206)
Fix this by setting skb->dev = slave before calling
netdev_start_xmit(), so that tunnel xmit functions see the correct
slave device with properly allocated tstats.
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: unconditionally bump set->nelems before insertion
In case that the set is full, a new element gets published then removed
without waiting for the RCU grace period, while RCU reader can be
walking over it already.
To address this issue, add the element transaction even if set is full,
but toggle the set_full flag to report -ENFILE so the abort path safely
unwinds the set to its previous state.
As for element updates, decrement set->nelems to restore it.
A simpler fix is to call synchronize_rcu() in the error path.
However, with a large batch adding elements to already maxed-out set,
this could cause noticeable slowdown of such batches.
In the Linux kernel, the following vulnerability has been resolved:
perf: Fix __perf_event_overflow() vs perf_remove_from_context() race
Make sure that __perf_event_overflow() runs with IRQs disabled for all
possible callchains. Specifically the software events can end up running
it with only preemption disabled.
This opens up a race vs perf_event_exit_event() and friends that will go
and free various things the overflow path expects to be present, like
the BPF program.
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to do sanity check on node footer in {read,write}_end_io
-----------[ cut here ]------------
kernel BUG at fs/f2fs/data.c:358!
Call Trace:
<IRQ>
blk_update_request+0x5eb/0xe70 block/blk-mq.c:987
blk_mq_end_request+0x3e/0x70 block/blk-mq.c:1149
blk_complete_reqs block/blk-mq.c:1224 [inline]
blk_done_softirq+0x107/0x160 block/blk-mq.c:1229
handle_softirqs+0x283/0x870 kernel/softirq.c:579
__do_softirq kernel/softirq.c:613 [inline]
invoke_softirq kernel/softirq.c:453 [inline]
__irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680
irq_exit_rcu+0x9/0x30 kernel/softirq.c:696
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1050 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1050
</IRQ>
In f2fs_write_end_io(), it detects there is inconsistency in between
node page index (nid) and footer.nid of node page.
If footer of node page is corrupted in fuzzed image, then we load corrupted
node page w/ async method, e.g. f2fs_ra_node_pages() or f2fs_ra_node_page(),
in where we won't do sanity check on node footer, once node page becomes
dirty, we will encounter this bug after node page writeback.
In the Linux kernel, the following vulnerability has been resolved:
fbdev: rivafb: fix divide error in nv3_arb()
A userspace program can trigger the RIVA NV3 arbitration code by calling
the FBIOPUT_VSCREENINFO ioctl on /dev/fb*. When doing so, the driver
recomputes FIFO arbitration parameters in nv3_arb(), using state->mclk_khz
(derived from the PRAMDAC MCLK PLL) as a divisor without validating it
first.
In a normal setup, state->mclk_khz is provided by the real hardware and is
non-zero. However, an attacker can construct a malicious or misconfigured
device (e.g. a crafted/emulated PCI device) that exposes a bogus PLL
configuration, causing state->mclk_khz to become zero. Once
nv3_get_param() calls nv3_arb(), the division by state->mclk_khz in the gns
calculation causes a divide error and crashes the kernel.
Fix this by checking whether state->mclk_khz is zero and bailing out before
doing the division.
The following log reveals it:
rivafb: setting virtual Y resolution to 2184
divide error: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 PID: 2187 Comm: syz-executor.0 Not tainted 5.18.0-rc1+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:nv3_arb drivers/video/fbdev/riva/riva_hw.c:439 [inline]
RIP: 0010:nv3_get_param+0x3ab/0x13b0 drivers/video/fbdev/riva/riva_hw.c:546
Call Trace:
nv3CalcArbitration.constprop.0+0x255/0x460 drivers/video/fbdev/riva/riva_hw.c:603
nv3UpdateArbitrationSettings drivers/video/fbdev/riva/riva_hw.c:637 [inline]
CalcStateExt+0x447/0x1b90 drivers/video/fbdev/riva/riva_hw.c:1246
riva_load_video_mode+0x8a9/0xea0 drivers/video/fbdev/riva/fbdev.c:779
rivafb_set_par+0xc0/0x5f0 drivers/video/fbdev/riva/fbdev.c:1196
fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1033
do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1109
fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1188
__x64_sys_ioctl+0x122/0x190 fs/ioctl.c:856
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix IS_CHECKPOINTED flag inconsistency issue caused by concurrent atomic commit and checkpoint writes
During SPO tests, when mounting F2FS, an -EINVAL error was returned from
f2fs_recover_inode_page. The issue occurred under the following scenario
Thread A Thread B
f2fs_ioc_commit_atomic_write
- f2fs_do_sync_file // atomic = true
- f2fs_fsync_node_pages
: last_folio = inode folio
: schedule before folio_lock(last_folio) f2fs_write_checkpoint
- block_operations// writeback last_folio
- schedule before f2fs_flush_nat_entries
: set_fsync_mark(last_folio, 1)
: set_dentry_mark(last_folio, 1)
: folio_mark_dirty(last_folio)
- __write_node_folio(last_folio)
: f2fs_down_read(&sbi->node_write)//block
- f2fs_flush_nat_entries
: {struct nat_entry}->flag |= BIT(IS_CHECKPOINTED)
- unblock_operations
: f2fs_up_write(&sbi->node_write)
f2fs_write_checkpoint//return
: f2fs_do_write_node_page()
f2fs_ioc_commit_atomic_write//return
SPO
Thread A calls f2fs_need_dentry_mark(sbi, ino), and the last_folio has
already been written once. However, the {struct nat_entry}->flag did not
have the IS_CHECKPOINTED set, causing set_dentry_mark(last_folio, 1) and
write last_folio again after Thread B finishes f2fs_write_checkpoint.
After SPO and reboot, it was detected that {struct node_info}->blk_addr
was not NULL_ADDR because Thread B successfully write the checkpoint.
This issue only occurs in atomic write scenarios. For regular file
fsync operations, the folio must be dirty. If
block_operations->f2fs_sync_node_pages successfully submit the folio
write, this path will not be executed. Otherwise, the
f2fs_write_checkpoint will need to wait for the folio write submission
to complete, as sbi->nr_pages[F2FS_DIRTY_NODES] > 0. Therefore, the
situation where f2fs_need_dentry_mark checks that the {struct
nat_entry}->flag /wo the IS_CHECKPOINTED flag, but the folio write has
already been submitted, will not occur.
Therefore, for atomic file fsync, sbi->node_write should be acquired
through __write_node_folio to ensure that the IS_CHECKPOINTED flag
correctly indicates that the checkpoint write has been completed.
In the Linux kernel, the following vulnerability has been resolved:
apparmor: fix unprivileged local user can do privileged policy management
An unprivileged local user can load, replace, and remove profiles by
opening the apparmorfs interfaces, via a confused deputy attack, by
passing the opened fd to a privileged process, and getting the
privileged process to write to the interface.
This does require a privileged target that can be manipulated to do
the write for the unprivileged process, but once such access is
achieved full policy management is possible and all the possible
implications that implies: removing confinement, DoS of system or
target applications by denying all execution, by-passing the
unprivileged user namespace restriction, to exploiting kernel bugs for
a local privilege escalation.
The policy management interface can not have its permissions simply
changed from 0666 to 0600 because non-root processes need to be able
to load policy to different policy namespaces.
Instead ensure the task writing the interface has privileges that
are a subset of the task that opened the interface. This is already
done via policy for confined processes, but unconfined can delegate
access to the opened fd, by-passing the usual policy check.
In the Linux kernel, the following vulnerability has been resolved:
apparmor: validate DFA start states are in bounds in unpack_pdb
Start states are read from untrusted data and used as indexes into the
DFA state tables. The aa_dfa_next() function call in unpack_pdb() will
access dfa->tables[YYTD_ID_BASE][start], and if the start state exceeds
the number of states in the DFA, this results in an out-of-bound read.
==================================================================
BUG: KASAN: slab-out-of-bounds in aa_dfa_next+0x2a1/0x360
Read of size 4 at addr ffff88811956fb90 by task su/1097
...
Reject policies with out-of-bounds start states during unpacking
to prevent the issue.
In the Linux kernel, the following vulnerability has been resolved:
net: liquidio: Initialize netdev pointer before queue setup
In setup_nic_devices(), the netdev is allocated using alloc_etherdev_mq().
However, the pointer to this structure is stored in oct->props[i].netdev
only after the calls to netif_set_real_num_rx_queues() and
netif_set_real_num_tx_queues().
If either of these functions fails, setup_nic_devices() returns an error
without freeing the allocated netdev. Since oct->props[i].netdev is still
NULL at this point, the cleanup function liquidio_destroy_nic_device()
will fail to find and free the netdev, resulting in a memory leak.
Fix this by initializing oct->props[i].netdev before calling the queue
setup functions. This ensures that the netdev is properly accessible for
cleanup in case of errors.
Compile tested only. Issue found using a prototype static analysis tool
and code review.