In the Linux kernel, the following vulnerability has been resolved:
sched/fair: Fix zero_vruntime tracking fix
John reported that stress-ng-yield could make his machine unhappy and
managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
zero_vruntime tracking").
The combination of yield and that commit was specific enough to
hypothesize the following scenario:
Suppose we have 2 runnable tasks, both doing yield. Then one will be
eligible and one will not be, because the average position must be in
between these two entities.
Therefore, the runnable task will be eligible, and be promoted a full
slice (all the tasks do is yield after all). This causes it to jump over
the other task and now the other task is eligible and current is no
longer. So we schedule.
Since we are runnable, there is no {de,en}queue. All we have is the
__{en,de}queue_entity() from {put_prev,set_next}_task(). But per the
fingered commit, those two no longer move zero_vruntime.
All that moves zero_vruntime are tick and full {de,en}queue.
This means, that if the two tasks playing leapfrog can reach the
critical speed to reach the overflow point inside one tick's worth of
time, we're up a creek.
Additionally, when multiple cgroups are involved, there is no guarantee
the tick will in fact hit every cgroup in a timely manner. Statistically
speaking it will, but that same statistics does not rule out the
possibility of one cgroup not getting a tick for a significant amount of
time -- however unlikely.
Therefore, just like with the yield() case, force an update at the end
of every slice. This ensures the update is never more than a single
slice behind and the whole thing is within 2 lag bounds as per the
comment on entity_key().
In the Linux kernel, the following vulnerability has been resolved:
soc/tegra: pmc: Fix unsafe generic_handle_irq() call
Currently, when resuming from system suspend on Tegra platforms,
the following warning is observed:
WARNING: CPU: 0 PID: 14459 at kernel/irq/irqdesc.c:666
Call trace:
handle_irq_desc+0x20/0x58 (P)
tegra186_pmc_wake_syscore_resume+0xe4/0x15c
syscore_resume+0x3c/0xb8
suspend_devices_and_enter+0x510/0x540
pm_suspend+0x16c/0x1d8
The warning occurs because generic_handle_irq() is being called from
a non-interrupt context which is considered as unsafe.
Fix this warning by deferring generic_handle_irq() call to an IRQ work
which gets executed in hard IRQ context where generic_handle_irq()
can be called safely.
When PREEMPT_RT kernels are used, regular IRQ work (initialized with
init_irq_work) is deferred to run in per-CPU kthreads in preemptible
context rather than hard IRQ context. Hence, use the IRQ_WORK_INIT_HARD
variant so that with PREEMPT_RT kernels, the IRQ work is processed in
hardirq context instead of being deferred to a thread which is required
for calling generic_handle_irq().
On non-PREEMPT_RT kernels, both init_irq_work() and IRQ_WORK_INIT_HARD()
execute in IRQ context, so this change has no functional impact for
standard kernel configurations.
[treding@nvidia.com: miscellaneous cleanups]
In the Linux kernel, the following vulnerability has been resolved:
media: i2c: ov5647: Initialize subdev before controls
In ov5647_init_controls() we call v4l2_get_subdevdata, but it is
initialized by v4l2_i2c_subdev_init() in the probe, which currently
happens after init_controls(). This can result in a segfault if the
error condition is hit, and we try to access i2c_client, so fix the
order.
In the Linux kernel, the following vulnerability has been resolved:
ACPI: processor: Fix NULL-pointer dereference in acpi_processor_errata_piix4()
In acpi_processor_errata_piix4(), the pointer dev is first assigned an IDE
device and then reassigned an ISA device:
dev = pci_get_subsys(..., PCI_DEVICE_ID_INTEL_82371AB, ...);
dev = pci_get_subsys(..., PCI_DEVICE_ID_INTEL_82371AB_0, ...);
If the first lookup succeeds but the second fails, dev becomes NULL. This
leads to a potential null-pointer dereference when dev_dbg() is called:
if (errata.piix4.bmisx)
dev_dbg(&dev->dev, ...);
To prevent this, use two temporary pointers and retrieve each device
independently, avoiding overwriting dev with a possible NULL value.
[ rjw: Subject adjustment, added an empty code line ]
In the Linux kernel, the following vulnerability has been resolved:
dm: remove fake timeout to avoid leak request
Since commit 15f73f5b3e59 ("blk-mq: move failure injection out of
blk_mq_complete_request"), drivers are responsible for calling
blk_should_fake_timeout() at appropriate code paths and opportunities.
However, the dm driver does not implement its own timeout handler and
relies on the timeout handling of its slave devices.
If an io-timeout-fail error is injected to a dm device, the request
will be leaked and never completed, causing tasks to hang indefinitely.
Reproduce:
1. prepare dm which has iscsi slave device
2. inject io-timeout-fail to dm
echo 1 >/sys/class/block/dm-0/io-timeout-fail
echo 100 >/sys/kernel/debug/fail_io_timeout/probability
echo 10 >/sys/kernel/debug/fail_io_timeout/times
3. read/write dm
4. iscsiadm -m node -u
Result: hang task like below
[ 862.243768] INFO: task kworker/u514:2:151 blocked for more than 122 seconds.
[ 862.244133] Tainted: G E 6.19.0-rc1+ #51
[ 862.244337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 862.244718] task:kworker/u514:2 state:D stack:0 pid:151 tgid:151 ppid:2 task_flags:0x4288060 flags:0x00080000
[ 862.245024] Workqueue: iscsi_ctrl_3:1 __iscsi_unbind_session [scsi_transport_iscsi]
[ 862.245264] Call Trace:
[ 862.245587] <TASK>
[ 862.245814] __schedule+0x810/0x15c0
[ 862.246557] schedule+0x69/0x180
[ 862.246760] blk_mq_freeze_queue_wait+0xde/0x120
[ 862.247688] elevator_change+0x16d/0x460
[ 862.247893] elevator_set_none+0x87/0xf0
[ 862.248798] blk_unregister_queue+0x12e/0x2a0
[ 862.248995] __del_gendisk+0x231/0x7e0
[ 862.250143] del_gendisk+0x12f/0x1d0
[ 862.250339] sd_remove+0x85/0x130 [sd_mod]
[ 862.250650] device_release_driver_internal+0x36d/0x530
[ 862.250849] bus_remove_device+0x1dd/0x3f0
[ 862.251042] device_del+0x38a/0x930
[ 862.252095] __scsi_remove_device+0x293/0x360
[ 862.252291] scsi_remove_target+0x486/0x760
[ 862.252654] __iscsi_unbind_session+0x18a/0x3e0 [scsi_transport_iscsi]
[ 862.252886] process_one_work+0x633/0xe50
[ 862.253101] worker_thread+0x6df/0xf10
[ 862.253647] kthread+0x36d/0x720
[ 862.254533] ret_from_fork+0x2a6/0x470
[ 862.255852] ret_from_fork_asm+0x1a/0x30
[ 862.256037] </TASK>
Remove the blk_should_fake_timeout() check from dm, as dm has no
native timeout handling and should not attempt to fake timeouts.
In the Linux kernel, the following vulnerability has been resolved:
KVM: nSVM: Remove a user-triggerable WARN on nested_svm_load_cr3() succeeding
Drop the WARN in svm_set_nested_state() on nested_svm_load_cr3() failing
as it is trivially easy to trigger from userspace by modifying CPUID after
loading CR3. E.g. modifying the state restoration selftest like so:
--- tools/testing/selftests/kvm/x86/state_test.c
+++ tools/testing/selftests/kvm/x86/state_test.c
@@ -280,7 +280,16 @@ int main(int argc, char *argv[])
/* Restore state in a new VM. */
vcpu = vm_recreate_with_one_vcpu(vm);
- vcpu_load_state(vcpu, state);
+
+ if (stage == 4) {
+ state->sregs.cr3 = BIT(44);
+ vcpu_load_state(vcpu, state);
+
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_MAX_PHY_ADDR, 36);
+ __vcpu_nested_state_set(vcpu, &state->nested);
+ } else {
+ vcpu_load_state(vcpu, state);
+ }
/*
* Restore XSAVE state in a dummy vCPU, first without doing
generates:
WARNING: CPU: 30 PID: 938 at arch/x86/kvm/svm/nested.c:1877 svm_set_nested_state+0x34a/0x360 [kvm_amd]
Modules linked in: kvm_amd kvm irqbypass [last unloaded: kvm]
CPU: 30 UID: 1000 PID: 938 Comm: state_test Tainted: G W 6.18.0-rc7-58e10b63777d-next-vm
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:svm_set_nested_state+0x34a/0x360 [kvm_amd]
Call Trace:
<TASK>
kvm_arch_vcpu_ioctl+0xf33/0x1700 [kvm]
kvm_vcpu_ioctl+0x4e6/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x61/0xad0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Simply delete the WARN instead of trying to prevent userspace from shoving
"illegal" state into CR3. For better or worse, KVM's ABI allows userspace
to set CPUID after SREGS, and vice versa, and KVM is very permissive when
it comes to guest CPUID. I.e. attempting to enforce the virtual CPU model
when setting CPUID could break userspace. Given that the WARN doesn't
provide any meaningful protection for KVM or benefit for userspace, simply
drop it even though the odds of breaking userspace are minuscule.
Opportunistically delete a spurious newline.
In the Linux kernel, the following vulnerability has been resolved:
iio: accel: adxl380: Avoid reading more entries than present in FIFO
The interrupt handler reads FIFO entries in batches of N samples, where N
is the number of scan elements that have been enabled. However, the sensor
fills the FIFO one sample at a time, even when more than one channel is
enabled. Therefore,the number of entries reported by the FIFO status
registers may not be a multiple of N; if this number is not a multiple, the
number of entries read from the FIFO may exceed the number of entries
actually present.
To fix the above issue, round down the number of FIFO entries read from the
status registers so that it is always a multiple of N.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: don't BUG() on unexpected delayed ref type in run_one_delayed_ref()
There is no need to BUG(), we can just return an error and log an error
message.
In the Linux kernel, the following vulnerability has been resolved:
md raid: fix hang when stopping arrays with metadata through dm-raid
When using device-mapper's dm-raid target, stopping a RAID array can cause
the system to hang under specific conditions.
This occurs when:
- A dm-raid managed device tree is suspended from top to bottom
(the top-level RAID device is suspended first, followed by its
underlying metadata and data devices)
- The top-level RAID device is then removed
Removing the top-level device triggers a hang in the following sequence:
the dm-raid destructor calls md_stop(), which tries to flush the
write-intent bitmap by writing to the metadata sub-devices. However, these
devices are already suspended, making them unable to complete the write-intent
operations and causing an indefinite block.
Fix:
- Prevent bitmap flushing when md_stop() is called from dm-raid
destructor context
and avoid a quiescing/unquescing cycle which could also cause I/O
- Still allow write-intent bitmap flushing when called from dm-raid
suspend context
This ensures that RAID array teardown can complete successfully even when the
underlying devices are in a suspended state.
This second patch uses md_is_rdwr() to distinguish between suspend and
destructor paths as elaborated on above.