In the Linux kernel, the following vulnerability has been resolved:
mm/hugetlb: unshare page tables during VMA split, not before
Currently, __split_vma() triggers hugetlb page table unsharing through
vm_ops->may_split(). This happens before the VMA lock and rmap locks are
taken - which is too early, it allows racing VMA-locked page faults in our
process and racing rmap walks from other processes to cause page tables to
be shared again before we actually perform the split.
Fix it by explicitly calling into the hugetlb unshare logic from
__split_vma() in the same place where THP splitting also happens. At that
point, both the VMA and the rmap(s) are write-locked.
An annoying detail is that we can now call into the helper
hugetlb_unshare_pmds() from two different locking contexts:
1. from hugetlb_split(), holding:
- mmap lock (exclusively)
- VMA lock
- file rmap lock (exclusively)
2. hugetlb_unshare_all_pmds(), which I think is designed to be able to
call us with only the mmap lock held (in shared mode), but currently
only runs while holding mmap lock (exclusively) and VMA lock
Backporting note:
This commit fixes a racy protection that was introduced in commit
b30c14cd6102 ("hugetlb: unshare some PMDs when splitting VMAs"); that
commit claimed to fix an issue introduced in 5.13, but it should actually
also go all the way back.
[jannh@google.com: v2]
In the Linux kernel, the following vulnerability has been resolved:
net_sched: prio: fix a race in prio_tune()
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Increase block_sequence array size
[Why]
It's possible to generate more than 50 steps in hwss_build_fast_sequence,
for example with a 6-pipe asic where all pipes are in one MPC chain. This
overflows the block_sequence buffer and corrupts block_sequence_steps,
causing a crash.
[How]
Expand block_sequence to 100 items. A naive upper bound on the possible
number of steps for a 6-pipe asic, ignoring the potential for steps to be
mutually exclusive, is 91 with current code, therefore 100 is sufficient.
In the Linux kernel, the following vulnerability has been resolved:
spi-rockchip: Fix register out of bounds access
Do not write native chip select stuff for GPIO chip selects.
GPIOs can be numbered much higher than native CS.
Also, it makes no sense.
In the Linux kernel, the following vulnerability has been resolved:
PCI: endpoint: pci-epf-test: Fix double free that causes kernel to oops
Fix a kernel oops found while testing the stm32_pcie Endpoint driver
with handling of PERST# deassertion:
During EP initialization, pci_epf_test_alloc_space() allocates all BARs,
which are further freed if epc_set_bar() fails (for instance, due to no
free inbound window).
However, when pci_epc_set_bar() fails, the error path:
pci_epc_set_bar() ->
pci_epf_free_space()
does not clear the previous assignment to epf_test->reg[bar].
Then, if the host reboots, the PERST# deassertion restarts the BAR
allocation sequence with the same allocation failure (no free inbound
window), creating a double free situation since epf_test->reg[bar] was
deallocated and is still non-NULL.
Thus, make sure that pci_epf_alloc_space() and pci_epf_free_space()
invocations are symmetric, and as such, set epf_test->reg[bar] to NULL
when memory is freed.
[kwilczynski: commit log]
In the Linux kernel, the following vulnerability has been resolved:
vhost-scsi: protect vq->log_used with vq->mutex
The vhost-scsi completion path may access vq->log_base when vq->log_used is
already set to false.
vhost-thread QEMU-thread
vhost_scsi_complete_cmd_work()
-> vhost_add_used()
-> vhost_add_used_n()
if (unlikely(vq->log_used))
QEMU disables vq->log_used
via VHOST_SET_VRING_ADDR.
mutex_lock(&vq->mutex);
vq->log_used = false now!
mutex_unlock(&vq->mutex);
QEMU gfree(vq->log_base)
log_used()
-> log_write(vq->log_base)
Assuming the VMM is QEMU. The vq->log_base is from QEMU userpace and can be
reclaimed via gfree(). As a result, this causes invalid memory writes to
QEMU userspace.
The control queue path has the same issue.
In the Linux kernel, the following vulnerability has been resolved:
virtio: break and reset virtio devices on device_shutdown()
Hongyu reported a hang on kexec in a VM. QEMU reported invalid memory
accesses during the hang.
Invalid read at addr 0x102877002, size 2, region '(null)', reason: rejected
Invalid write at addr 0x102877A44, size 2, region '(null)', reason: rejected
...
It was traced down to virtio-console. Kexec works fine if virtio-console
is not in use.
The issue is that virtio-console continues to write to the MMIO even after
underlying virtio-pci device is reset.
Additionally, Eric noticed that IOMMUs are reset before devices, if
devices are not reset on shutdown they continue to poke at guest memory
and get errors from the IOMMU. Some devices get wedged then.
The problem can be solved by breaking all virtio devices on virtio
bus shutdown, then resetting them.
In the Linux kernel, the following vulnerability has been resolved:
rseq: Fix segfault on registration when rseq_cs is non-zero
The rseq_cs field is documented as being set to 0 by user-space prior to
registration, however this is not currently enforced by the kernel. This
can result in a segfault on return to user-space if the value stored in
the rseq_cs field doesn't point to a valid struct rseq_cs.
The correct solution to this would be to fail the rseq registration when
the rseq_cs field is non-zero. However, some older versions of glibc
will reuse the rseq area of previous threads without clearing the
rseq_cs field and will also terminate the process if the rseq
registration fails in a secondary thread. This wasn't caught in testing
because in this case the leftover rseq_cs does point to a valid struct
rseq_cs.
What we can do is clear the rseq_cs field on registration when it's
non-zero which will prevent segfaults on registration and won't break
the glibc versions that reuse rseq areas on thread creation.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: avoid NULL pointer dereference if no valid csum tree
[BUG]
When trying read-only scrub on a btrfs with rescue=idatacsums mount
option, it will crash with the following call trace:
BUG: kernel NULL pointer dereference, address: 0000000000000208
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 1 UID: 0 PID: 835 Comm: btrfs Tainted: G O 6.15.0-rc3-custom+ #236 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
RIP: 0010:btrfs_lookup_csums_bitmap+0x49/0x480 [btrfs]
Call Trace:
<TASK>
scrub_find_fill_first_stripe+0x35b/0x3d0 [btrfs]
scrub_simple_mirror+0x175/0x290 [btrfs]
scrub_stripe+0x5f7/0x6f0 [btrfs]
scrub_chunk+0x9a/0x150 [btrfs]
scrub_enumerate_chunks+0x333/0x660 [btrfs]
btrfs_scrub_dev+0x23e/0x600 [btrfs]
btrfs_ioctl+0x1dcf/0x2f80 [btrfs]
__x64_sys_ioctl+0x97/0xc0
do_syscall_64+0x4f/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[CAUSE]
Mount option "rescue=idatacsums" will completely skip loading the csum
tree, so that any data read will not find any data csum thus we will
ignore data checksum verification.
Normally call sites utilizing csum tree will check the fs state flag
NO_DATA_CSUMS bit, but unfortunately scrub does not check that bit at all.
This results in scrub to call btrfs_search_slot() on a NULL pointer
and triggered above crash.
[FIX]
Check both extent and csum tree root before doing any tree search.