In the Linux kernel, the following vulnerability has been resolved:
mm/hugetlb: unshare page tables during VMA split, not before
Currently, __split_vma() triggers hugetlb page table unsharing through
vm_ops->may_split(). This happens before the VMA lock and rmap locks are
taken - which is too early, it allows racing VMA-locked page faults in our
process and racing rmap walks from other processes to cause page tables to
be shared again before we actually perform the split.
Fix it by explicitly calling into the hugetlb unshare logic from
__split_vma() in the same place where THP splitting also happens. At that
point, both the VMA and the rmap(s) are write-locked.
An annoying detail is that we can now call into the helper
hugetlb_unshare_pmds() from two different locking contexts:
1. from hugetlb_split(), holding:
- mmap lock (exclusively)
- VMA lock
- file rmap lock (exclusively)
2. hugetlb_unshare_all_pmds(), which I think is designed to be able to
call us with only the mmap lock held (in shared mode), but currently
only runs while holding mmap lock (exclusively) and VMA lock
Backporting note:
This commit fixes a racy protection that was introduced in commit
b30c14cd6102 ("hugetlb: unshare some PMDs when splitting VMAs"); that
commit claimed to fix an issue introduced in 5.13, but it should actually
also go all the way back.
[jannh@google.com: v2]
In the Linux kernel, the following vulnerability has been resolved:
net_sched: prio: fix a race in prio_tune()
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
In the Linux kernel, the following vulnerability has been resolved:
KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT
Treat the NX bit as valid when using NPT, as KVM will set the NX bit when
the NX huge page mitigation is enabled (mindblowing) and trigger the WARN
that fires on reserved SPTE bits being set.
KVM has required NX support for SVM since commit b26a71a1a5b9 ("KVM: SVM:
Refuse to load kvm_amd if NX support is not available") for exactly this
reason, but apparently it never occurred to anyone to actually test NPT
with the mitigation enabled.
------------[ cut here ]------------
spte = 0x800000018a600ee7, level = 2, rsvd bits = 0x800f0000001fe000
WARNING: CPU: 152 PID: 15966 at arch/x86/kvm/mmu/spte.c:215 make_spte+0x327/0x340 [kvm]
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
RIP: 0010:make_spte+0x327/0x340 [kvm]
Call Trace:
<TASK>
tdp_mmu_map_handle_target_level+0xc3/0x230 [kvm]
kvm_tdp_mmu_map+0x343/0x3b0 [kvm]
direct_page_fault+0x1ae/0x2a0 [kvm]
kvm_tdp_page_fault+0x7d/0x90 [kvm]
kvm_mmu_page_fault+0xfb/0x2e0 [kvm]
npf_interception+0x55/0x90 [kvm_amd]
svm_invoke_exit_handler+0x31/0xf0 [kvm_amd]
svm_handle_exit+0xf6/0x1d0 [kvm_amd]
vcpu_enter_guest+0xb6d/0xee0 [kvm]
? kvm_pmu_trigger_event+0x6d/0x230 [kvm]
vcpu_run+0x65/0x2c0 [kvm]
kvm_arch_vcpu_ioctl_run+0x355/0x610 [kvm]
kvm_vcpu_ioctl+0x551/0x610 [kvm]
__se_sys_ioctl+0x77/0xc0
__x64_sys_ioctl+0x1d/0x20
do_syscall_64+0x44/0xa0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>
---[ end trace 0000000000000000 ]---
In the Linux kernel, the following vulnerability has been resolved:
crypto: ccp - Use kzalloc for sev ioctl interfaces to prevent kernel memory leak
For some sev ioctl interfaces, input may be passed that is less than or
equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data that PSP
firmware returns. In this case, kmalloc will allocate memory that is the
size of the input rather than the size of the data. Since PSP firmware
doesn't fully overwrite the buffer, the sev ioctl interfaces with the
issue may return uninitialized slab memory.
Currently, all of the ioctl interfaces in the ccp driver are safe, but
to prevent future problems, change all ioctl interfaces that allocate
memory with kmalloc to use kzalloc and memset the data buffer to zero
in sev_ioctl_do_platform_status.
In the Linux kernel, the following vulnerability has been resolved:
ALSA: bcd2000: Fix a UAF bug on the error path of probing
When the driver fails in snd_card_register() at probe time, it will free
the 'bcd2k->midi_out_urb' before killing it, which may cause a UAF bug.
The following log can reveal it:
[ 50.727020] BUG: KASAN: use-after-free in bcd2000_input_complete+0x1f1/0x2e0 [snd_bcd2000]
[ 50.727623] Read of size 8 at addr ffff88810fab0e88 by task swapper/4/0
[ 50.729530] Call Trace:
[ 50.732899] bcd2000_input_complete+0x1f1/0x2e0 [snd_bcd2000]
Fix this by adding usb_kill_urb() before usb_free_urb().
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: do not allow SET_ID to refer to another table
When doing lookups for sets on the same batch by using its ID, a set from a
different table can be used.
Then, when the table is removed, a reference to the set may be kept after
the set is freed, leading to a potential use-after-free.
When looking for sets by ID, use the table that was used for the lookup by
name, and only return sets belonging to that same table.
This fixes CVE-2022-2586, also reported as ZDI-CAN-17470.
In the Linux kernel, the following vulnerability has been resolved:
coresight: Clear the connection field properly
coresight devices track their connections (output connections) and
hold a reference to the fwnode. When a device goes away, we walk through
the devices on the coresight bus and make sure that the references
are dropped. This happens both ways:
a) For all output connections from the device, drop the reference to
the target device via coresight_release_platform_data()
b) Iterate over all the devices on the coresight bus and drop the
reference to fwnode if *this* device is the target of the output
connection, via coresight_remove_conns()->coresight_remove_match().
However, the coresight_remove_match() doesn't clear the fwnode field,
after dropping the reference, this causes use-after-free and
additional refcount drops on the fwnode.
e.g., if we have two devices, A and B, with a connection, A -> B.
If we remove B first, B would clear the reference on B, from A
via coresight_remove_match(). But when A is removed, it still has
a connection with fwnode still pointing to B. Thus it tries to drops
the reference in coresight_release_platform_data(), raising the bells
like :
[ 91.990153] ------------[ cut here ]------------
[ 91.990163] refcount_t: addition on 0; use-after-free.
[ 91.990212] WARNING: CPU: 0 PID: 461 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
[ 91.990260] Modules linked in: coresight_funnel coresight_replicator coresight_etm4x(-)
crct10dif_ce coresight ip_tables x_tables ipv6 [last unloaded: coresight_cpu_debug]
[ 91.990398] CPU: 0 PID: 461 Comm: rmmod Tainted: G W T 5.19.0-rc2+ #53
[ 91.990418] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb 1 2019
[ 91.990434] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 91.990454] pc : refcount_warn_saturate+0xa0/0x144
[ 91.990476] lr : refcount_warn_saturate+0xa0/0x144
[ 91.990496] sp : ffff80000c843640
[ 91.990509] x29: ffff80000c843640 x28: ffff800009957c28 x27: ffff80000c8439a8
[ 91.990560] x26: ffff00097eff1990 x25: ffff8000092b6ad8 x24: ffff00097eff19a8
[ 91.990610] x23: ffff80000c8439a8 x22: 0000000000000000 x21: ffff80000c8439c2
[ 91.990659] x20: 0000000000000000 x19: ffff00097eff1a10 x18: ffff80000ab99c40
[ 91.990708] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000abf6fa0
[ 91.990756] x14: 000000000000001d x13: 0a2e656572662d72 x12: 657466612d657375
[ 91.990805] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffff8000081aba28
[ 91.990854] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 746e756f63666572
[ 91.990903] x5 : ffff00097648ec58 x4 : 0000000000000000 x3 : 0000000000000027
[ 91.990952] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00080260ba00
[ 91.991000] Call trace:
[ 91.991012] refcount_warn_saturate+0xa0/0x144
[ 91.991034] kobject_get+0xac/0xb0
[ 91.991055] of_node_get+0x2c/0x40
[ 91.991076] of_fwnode_get+0x40/0x60
[ 91.991094] fwnode_handle_get+0x3c/0x60
[ 91.991116] fwnode_get_nth_parent+0xf4/0x110
[ 91.991137] fwnode_full_name_string+0x48/0xc0
[ 91.991158] device_node_string+0x41c/0x530
[ 91.991178] pointer+0x320/0x3ec
[ 91.991198] vsnprintf+0x23c/0x750
[ 91.991217] vprintk_store+0x104/0x4b0
[ 91.991238] vprintk_emit+0x8c/0x360
[ 91.991257] vprintk_default+0x44/0x50
[ 91.991276] vprintk+0xcc/0xf0
[ 91.991295] _printk+0x68/0x90
[ 91.991315] of_node_release+0x13c/0x14c
[ 91.991334] kobject_put+0x98/0x114
[ 91.991354] of_node_put+0x24/0x34
[ 91.991372] of_fwnode_put+0x40/0x5c
[ 91.991390] fwnode_handle_put+0x38/0x50
[ 91.991411] coresight_release_platform_data+0x74/0xb0 [coresight]
[ 91.991472] coresight_unregister+0x64/0xcc [coresight]
[ 91.991525] etm4_remove_dev+0x64/0x78 [coresight_etm4x]
[ 91.991563] etm4_remove_amba+0x1c/0x2c [coresight_etm4x]
[ 91.991598] amba_remove+0x3c/0x19c
---truncated---
In the Linux kernel, the following vulnerability has been resolved:
scsi: sg: Allow waiting for commands to complete on removed device
When a SCSI device is removed while in active use, currently sg will
immediately return -ENODEV on any attempt to wait for active commands that
were sent before the removal. This is problematic for commands that use
SG_FLAG_DIRECT_IO since the data buffer may still be in use by the kernel
when userspace frees or reuses it after getting ENODEV, leading to
corrupted userspace memory (in the case of READ-type commands) or corrupted
data being sent to the device (in the case of WRITE-type commands). This
has been seen in practice when logging out of a iscsi_tcp session, where
the iSCSI driver may still be processing commands after the device has been
marked for removal.
Change the policy to allow userspace to wait for active sg commands even
when the device is being removed. Return -ENODEV only when there are no
more responses to read.