In the Linux kernel, the following vulnerability has been resolved:
nilfs2: do not force clear folio if buffer is referenced
Patch series "nilfs2: protect busy buffer heads from being force-cleared".
This series fixes the buffer head state inconsistency issues reported by
syzbot that occurs when the filesystem is corrupted and falls back to
read-only, and the associated buffer head use-after-free issue.
This patch (of 2):
Syzbot has reported that after nilfs2 detects filesystem corruption and
falls back to read-only, inconsistencies in the buffer state may occur.
One of the inconsistencies is that when nilfs2 calls mark_buffer_dirty()
to set a data or metadata buffer as dirty, but it detects that the buffer
is not in the uptodate state:
WARNING: CPU: 0 PID: 6049 at fs/buffer.c:1177 mark_buffer_dirty+0x2e5/0x520
fs/buffer.c:1177
...
Call Trace:
<TASK>
nilfs_palloc_commit_alloc_entry+0x4b/0x160 fs/nilfs2/alloc.c:598
nilfs_ifile_create_inode+0x1dd/0x3a0 fs/nilfs2/ifile.c:73
nilfs_new_inode+0x254/0x830 fs/nilfs2/inode.c:344
nilfs_mkdir+0x10d/0x340 fs/nilfs2/namei.c:218
vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257
do_mkdirat+0x264/0x3a0 fs/namei.c:4280
__do_sys_mkdirat fs/namei.c:4295 [inline]
__se_sys_mkdirat fs/namei.c:4293 [inline]
__x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The other is when nilfs_btree_propagate(), which propagates the dirty
state to the ancestor nodes of a b-tree that point to a dirty buffer,
detects that the origin buffer is not dirty, even though it should be:
WARNING: CPU: 0 PID: 5245 at fs/nilfs2/btree.c:2089
nilfs_btree_propagate+0xc79/0xdf0 fs/nilfs2/btree.c:2089
...
Call Trace:
<TASK>
nilfs_bmap_propagate+0x75/0x120 fs/nilfs2/bmap.c:345
nilfs_collect_file_data+0x4d/0xd0 fs/nilfs2/segment.c:587
nilfs_segctor_apply_buffers+0x184/0x340 fs/nilfs2/segment.c:1006
nilfs_segctor_scan_file+0x28c/0xa50 fs/nilfs2/segment.c:1045
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1216 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1540 [inline]
nilfs_segctor_do_construct+0x1c28/0x6b90 fs/nilfs2/segment.c:2115
nilfs_segctor_construct+0x181/0x6b0 fs/nilfs2/segment.c:2479
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2587 [inline]
nilfs_segctor_thread+0x69e/0xe80 fs/nilfs2/segment.c:2701
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Both of these issues are caused by the callbacks that handle the
page/folio write requests, forcibly clear various states, including the
working state of the buffers they hold, at unexpected times when they
detect read-only fallback.
Fix these issues by checking if the buffer is referenced before clearing
the page/folio state, and skipping the clear if it is.
In the Linux kernel, the following vulnerability has been resolved:
net/rose: prevent integer overflows in rose_setsockopt()
In case of possible unpredictably large arguments passed to
rose_setsockopt() and multiplied by extra values on top of that,
integer overflows may occur.
Do the safest minimum and fix these issues by checking the
contents of 'opt' and returning -EINVAL if they are too large. Also,
switch to unsigned int and remove useless check for negative 'opt'
in ROSE_IDLE case.
In the Linux kernel, the following vulnerability has been resolved:
md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime
After commit ec6bb299c7c3 ("md/md-bitmap: add 'sync_size' into struct
md_bitmap_stats"), following panic is reported:
Oops: general protection fault, probably for non-canonical address
RIP: 0010:bitmap_get_stats+0x2b/0xa0
Call Trace:
<TASK>
md_seq_show+0x2d2/0x5b0
seq_read_iter+0x2b9/0x470
seq_read+0x12f/0x180
proc_reg_read+0x57/0xb0
vfs_read+0xf6/0x380
ksys_read+0x6c/0xf0
do_syscall_64+0x82/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Root cause is that bitmap_get_stats() can be called at anytime if mddev
is still there, even if bitmap is destroyed, or not fully initialized.
Deferenceing bitmap in this case can crash the kernel. Meanwhile, the
above commit start to deferencing bitmap->storage, make the problem
easier to trigger.
Fix the problem by protecting bitmap_get_stats() with bitmap_info.mutex.
In the Linux kernel, the following vulnerability has been resolved:
net_sched: sch_sfq: don't allow 1 packet limit
The current implementation does not work correctly with a limit of
1. iproute2 actually checks for this and this patch adds the check in
kernel as well.
This fixes the following syzkaller reported crash:
UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:210:6
index 65535 is out of range for type 'struct sfq_head[128]'
CPU: 0 PID: 2569 Comm: syz-executor101 Not tainted 5.10.0-smp-DEV #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x125/0x19f lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:148 [inline]
__ubsan_handle_out_of_bounds+0xed/0x120 lib/ubsan.c:347
sfq_link net/sched/sch_sfq.c:210 [inline]
sfq_dec+0x528/0x600 net/sched/sch_sfq.c:238
sfq_dequeue+0x39b/0x9d0 net/sched/sch_sfq.c:500
sfq_reset+0x13/0x50 net/sched/sch_sfq.c:525
qdisc_reset+0xfe/0x510 net/sched/sch_generic.c:1026
tbf_reset+0x3d/0x100 net/sched/sch_tbf.c:319
qdisc_reset+0xfe/0x510 net/sched/sch_generic.c:1026
dev_reset_queue+0x8c/0x140 net/sched/sch_generic.c:1296
netdev_for_each_tx_queue include/linux/netdevice.h:2350 [inline]
dev_deactivate_many+0x6dc/0xc20 net/sched/sch_generic.c:1362
__dev_close_many+0x214/0x350 net/core/dev.c:1468
dev_close_many+0x207/0x510 net/core/dev.c:1506
unregister_netdevice_many+0x40f/0x16b0 net/core/dev.c:10738
unregister_netdevice_queue+0x2be/0x310 net/core/dev.c:10695
unregister_netdevice include/linux/netdevice.h:2893 [inline]
__tun_detach+0x6b6/0x1600 drivers/net/tun.c:689
tun_detach drivers/net/tun.c:705 [inline]
tun_chr_close+0x104/0x1b0 drivers/net/tun.c:3640
__fput+0x203/0x840 fs/file_table.c:280
task_work_run+0x129/0x1b0 kernel/task_work.c:185
exit_task_work include/linux/task_work.h:33 [inline]
do_exit+0x5ce/0x2200 kernel/exit.c:931
do_group_exit+0x144/0x310 kernel/exit.c:1046
__do_sys_exit_group kernel/exit.c:1057 [inline]
__se_sys_exit_group kernel/exit.c:1055 [inline]
__x64_sys_exit_group+0x3b/0x40 kernel/exit.c:1055
do_syscall_64+0x6c/0xd0
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7fe5e7b52479
Code: Unable to access opcode bytes at RIP 0x7fe5e7b5244f.
RSP: 002b:00007ffd3c800398 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe5e7b52479
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 00007fe5e7bcd2d0 R08: ffffffffffffffb8 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe5e7bcd2d0
R13: 0000000000000000 R14: 00007fe5e7bcdd20 R15: 00007fe5e7b24270
The crash can be also be reproduced with the following (with a tc
recompiled to allow for sfq limits of 1):
tc qdisc add dev dummy0 handle 1: root tbf rate 1Kbit burst 100b lat 1s
../iproute2-6.9.0/tc/tc qdisc add dev dummy0 handle 2: parent 1:10 sfq limit 1
ifconfig dummy0 up
ping -I dummy0 -f -c2 -W0.1 8.8.8.8
sleep 1
Scenario that triggers the crash:
* the first packet is sent and queued in TBF and SFQ; qdisc qlen is 1
* TBF dequeues: it peeks from SFQ which moves the packet to the
gso_skb list and keeps qdisc qlen set to 1. TBF is out of tokens so
it schedules itself for later.
* the second packet is sent and TBF tries to queues it to SFQ. qdisc
qlen is now 2 and because the SFQ limit is 1 the packet is dropped
by SFQ. At this point qlen is 1, and all of the SFQ slots are empty,
however q->tail is not NULL.
At this point, assuming no more packets are queued, when sch_dequeue
runs again it will decrement the qlen for the current empty slot
causing an underflow and the subsequent out of bounds access.
In the Linux kernel, the following vulnerability has been resolved:
pps: Fix a use-after-free
On a board running ntpd and gpsd, I'm seeing a consistent use-after-free
in sys_exit() from gpsd when rebooting:
pps pps1: removed
------------[ cut here ]------------
kobject: '(null)' (00000000db4bec24): is not initialized, yet kobject_put() is being called.
WARNING: CPU: 2 PID: 440 at lib/kobject.c:734 kobject_put+0x120/0x150
CPU: 2 UID: 299 PID: 440 Comm: gpsd Not tainted 6.11.0-rc6-00308-gb31c44928842 #1
Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kobject_put+0x120/0x150
lr : kobject_put+0x120/0x150
sp : ffffffc0803d3ae0
x29: ffffffc0803d3ae0 x28: ffffff8042dc9738 x27: 0000000000000001
x26: 0000000000000000 x25: ffffff8042dc9040 x24: ffffff8042dc9440
x23: ffffff80402a4620 x22: ffffff8042ef4bd0 x21: ffffff80405cb600
x20: 000000000008001b x19: ffffff8040b3b6e0 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 696e6920746f6e20
x14: 7369203a29343263 x13: 205d303434542020 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
kobject_put+0x120/0x150
cdev_put+0x20/0x3c
__fput+0x2c4/0x2d8
____fput+0x1c/0x38
task_work_run+0x70/0xfc
do_exit+0x2a0/0x924
do_group_exit+0x34/0x90
get_signal+0x7fc/0x8c0
do_signal+0x128/0x13b4
do_notify_resume+0xdc/0x160
el0_svc+0xd4/0xf8
el0t_64_sync_handler+0x140/0x14c
el0t_64_sync+0x190/0x194
---[ end trace 0000000000000000 ]---
...followed by more symptoms of corruption, with similar stacks:
refcount_t: underflow; use-after-free.
kernel BUG at lib/list_debug.c:62!
Kernel panic - not syncing: Oops - BUG: Fatal exception
This happens because pps_device_destruct() frees the pps_device with the
embedded cdev immediately after calling cdev_del(), but, as the comment
above cdev_del() notes, fops for previously opened cdevs are still
callable even after cdev_del() returns. I think this bug has always
been there: I can't explain why it suddenly started happening every time
I reboot this particular board.
In commit d953e0e837e6 ("pps: Fix a use-after free bug when
unregistering a source."), George Spelvin suggested removing the
embedded cdev. That seems like the simplest way to fix this, so I've
implemented his suggestion, using __register_chrdev() with pps_idr
becoming the source of truth for which minor corresponds to which
device.
But now that pps_idr defines userspace visibility instead of cdev_add(),
we need to be sure the pps->dev refcount can't reach zero while
userspace can still find it again. So, the idr_remove() call moves to
pps_unregister_cdev(), and pps_idr now holds a reference to pps->dev.
pps_core: source serial1 got cdev (251:1)
<...>
pps pps1: removed
pps_core: unregistering pps1
pps_core: deallocating pps1
In the Linux kernel, the following vulnerability has been resolved:
media: uvcvideo: Fix double free in error path
If the uvc_status_init() function fails to allocate the int_urb, it will
free the dev->status pointer but doesn't reset the pointer to NULL. This
results in the kfree() call in uvc_status_cleanup() trying to
double-free the memory. Fix it by resetting the dev->status pointer to
NULL after freeing it.
Reviewed by: Ricardo Ribalda <ribalda@chromium.org>
In the Linux kernel, the following vulnerability has been resolved:
rdma/cxgb4: Prevent potential integer overflow on 32bit
The "gl->tot_len" variable is controlled by the user. It comes from
process_responses(). On 32bit systems, the "gl->tot_len + sizeof(struct
cpl_pass_accept_req) + sizeof(struct rss_header)" addition could have an
integer wrapping bug. Use size_add() to prevent this.
In the Linux kernel, the following vulnerability has been resolved:
udp: Deal with race between UDP socket address change and rehash
If a UDP socket changes its local address while it's receiving
datagrams, as a result of connect(), there is a period during which
a lookup operation might fail to find it, after the address is changed
but before the secondary hash (port and address) and the four-tuple
hash (local and remote ports and addresses) are updated.
Secondary hash chains were introduced by commit 30fff9231fad ("udp:
bind() optimisation") and, as a result, a rehash operation became
needed to make a bound socket reachable again after a connect().
This operation was introduced by commit 719f835853a9 ("udp: add
rehash on connect()") which isn't however a complete fix: the
socket will be found once the rehashing completes, but not while
it's pending.
This is noticeable with a socat(1) server in UDP4-LISTEN mode, and a
client sending datagrams to it. After the server receives the first
datagram (cf. _xioopen_ipdgram_listen()), it issues a connect() to
the address of the sender, in order to set up a directed flow.
Now, if the client, running on a different CPU thread, happens to
send a (subsequent) datagram while the server's socket changes its
address, but is not rehashed yet, this will result in a failed
lookup and a port unreachable error delivered to the client, as
apparent from the following reproducer:
LEN=$(($(cat /proc/sys/net/core/wmem_default) / 4))
dd if=/dev/urandom bs=1 count=${LEN} of=tmp.in
while :; do
taskset -c 1 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,trunc &
sleep 0.1 || sleep 1
taskset -c 2 socat OPEN:tmp.in UDP4:localhost:1337,shut-null
wait
done
where the client will eventually get ECONNREFUSED on a write()
(typically the second or third one of a given iteration):
2024/11/13 21:28:23 socat[46901] E write(6, 0x556db2e3c000, 8192): Connection refused
This issue was first observed as a seldom failure in Podman's tests
checking UDP functionality while using pasta(1) to connect the
container's network namespace, which leads us to a reproducer with
the lookup error resulting in an ICMP packet on a tap device:
LOCAL_ADDR="$(ip -j -4 addr show|jq -rM '.[] | .addr_info[0] | select(.scope == "global").local')"
while :; do
./pasta --config-net -p pasta.pcap -u 1337 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,trunc &
sleep 0.2 || sleep 1
socat OPEN:tmp.in UDP4:${LOCAL_ADDR}:1337,shut-null
wait
cmp tmp.in tmp.out
done
Once this fails:
tmp.in tmp.out differ: char 8193, line 29
we can finally have a look at what's going on:
$ tshark -r pasta.pcap
1 0.000000 :: ? ff02::16 ICMPv6 110 Multicast Listener Report Message v2
2 0.168690 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
3 0.168767 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
4 0.168806 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
5 0.168827 c6:47:05:8d:dc:04 ? Broadcast ARP 42 Who has 88.198.0.161? Tell 88.198.0.164
6 0.168851 9a:55:9a:55:9a:55 ? c6:47:05:8d:dc:04 ARP 42 88.198.0.161 is at 9a:55:9a:55:9a:55
7 0.168875 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
8 0.168896 88.198.0.164 ? 88.198.0.161 ICMP 590 Destination unreachable (Port unreachable)
9 0.168926 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
10 0.168959 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
11 0.168989 88.198.0.161 ? 88.198.0.164 UDP 4138 60260 ? 1337 Len=4096
12 0.169010 88.198.0.161 ? 88.198.0.164 UDP 42 60260 ? 1337 Len=0
On the third datagram received, the network namespace of the container
initiates an ARP lookup to deliver the ICMP message.
In another variant of this reproducer, starting the client with:
strace -f pasta --config-net -u 1337 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,tru
---truncated---
In the Linux kernel, the following vulnerability has been resolved:
btrfs: do proper folio cleanup when run_delalloc_nocow() failed
[BUG]
With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash
with the following VM_BUG_ON_FOLIO():
BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28
BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28
page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664
aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774"
flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff)
page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2992!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
pc : folio_clear_dirty_for_io+0x128/0x258
lr : folio_clear_dirty_for_io+0x128/0x258
Call trace:
folio_clear_dirty_for_io+0x128/0x258
btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs]
__process_folios_contig+0x154/0x268 [btrfs]
extent_clear_unlock_delalloc+0x5c/0x80 [btrfs]
run_delalloc_nocow+0x5f8/0x760 [btrfs]
btrfs_run_delalloc_range+0xa8/0x220 [btrfs]
writepage_delalloc+0x230/0x4c8 [btrfs]
extent_writepage+0xb8/0x358 [btrfs]
extent_write_cache_pages+0x21c/0x4e8 [btrfs]
btrfs_writepages+0x94/0x150 [btrfs]
do_writepages+0x74/0x190
filemap_fdatawrite_wbc+0x88/0xc8
start_delalloc_inodes+0x178/0x3a8 [btrfs]
btrfs_start_delalloc_roots+0x174/0x280 [btrfs]
shrink_delalloc+0x114/0x280 [btrfs]
flush_space+0x250/0x2f8 [btrfs]
btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]
process_one_work+0x164/0x408
worker_thread+0x25c/0x388
kthread+0x100/0x118
ret_from_fork+0x10/0x20
Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000)
---[ end trace 0000000000000000 ]---
[CAUSE]
The first two lines of extra debug messages show the problem is caused
by the error handling of run_delalloc_nocow().
E.g. we have the following dirtied range (4K blocksize 4K page size):
0 16K 32K
|//////////////////////////////////////|
| Pre-allocated |
And the range [0, 16K) has a preallocated extent.
- Enter run_delalloc_nocow() for range [0, 16K)
Which found range [0, 16K) is preallocated, can do the proper NOCOW
write.
- Enter fallback_to_fow() for range [16K, 32K)
Since the range [16K, 32K) is not backed by preallocated extent, we
have to go COW.
- cow_file_range() failed for range [16K, 32K)
So cow_file_range() will do the clean up by clearing folio dirty,
unlock the folios.
Now the folios in range [16K, 32K) is unlocked.
- Enter extent_clear_unlock_delalloc() from run_delalloc_nocow()
Which is called with PAGE_START_WRITEBACK to start page writeback.
But folios can only be marked writeback when it's properly locked,
thus this triggered the VM_BUG_ON_FOLIO().
Furthermore there is another hidden but common bug that
run_delalloc_nocow() is not clearing the folio dirty flags in its error
handling path.
This is the common bug shared between run_delalloc_nocow() and
cow_file_range().
[FIX]
- Clear folio dirty for range [@start, @cur_offset)
Introduce a helper, cleanup_dirty_folios(), which
will find and lock the folio in the range, clear the dirty flag and
start/end the writeback, with the extra handling for the
@locked_folio.
- Introduce a helper to clear folio dirty, start and end writeback
- Introduce a helper to record the last failed COW range end
This is to trace which range we should skip, to avoid double
unlocking.
- Skip the failed COW range for the e
---truncated---