In the Linux kernel, the following vulnerability has been resolved:
x86/sgx: Prevent attempts to reclaim poisoned pages
TL;DR: SGX page reclaim touches the page to copy its contents to
secondary storage. SGX instructions do not gracefully handle machine
checks. Despite this, the existing SGX code will try to reclaim pages
that it _knows_ are poisoned. Avoid even trying to reclaim poisoned pages.
The longer story:
Pages used by an enclave only get epc_page->poison set in
arch_memory_failure() but they currently stay on sgx_active_page_list until
sgx_encl_release(), with the SGX_EPC_PAGE_RECLAIMER_TRACKED flag untouched.
epc_page->poison is not checked in the reclaimer logic meaning that, if other
conditions are met, an attempt will be made to reclaim an EPC page that was
poisoned. This is bad because 1. we don't want that page to end up added
to another enclave and 2. it is likely to cause one core to shut down
and the kernel to panic.
Specifically, reclaiming uses microcode operations including "EWB" which
accesses the EPC page contents to encrypt and write them out to non-SGX
memory. Those operations cannot handle MCEs in their accesses other than
by putting the executing core into a special shutdown state (affecting
both threads with HT.) The kernel will subsequently panic on the
remaining cores seeing the core didn't enter MCE handler(s) in time.
Call sgx_unmark_page_reclaimable() to remove the affected EPC page from
sgx_active_page_list on memory error to stop it being considered for
reclaiming.
Testing epc_page->poison in sgx_reclaim_pages() would also work but I assume
it's better to add code in the less likely paths.
The affected EPC page is not added to &node->sgx_poison_page_list until
later in sgx_encl_release()->sgx_free_epc_page() when it is EREMOVEd.
Membership on other lists doesn't change to avoid changing any of the
lists' semantics except for sgx_active_page_list. There's a "TBD" comment
in arch_memory_failure() about pre-emptive actions, the goal here is not
to address everything that it may imply.
This also doesn't completely close the time window when a memory error
notification will be fatal (for a not previously poisoned EPC page) --
the MCE can happen after sgx_reclaim_pages() has selected its candidates
or even *inside* a microcode operation (actually easy to trigger due to
the amount of time spent in them.)
The spinlock in sgx_unmark_page_reclaimable() is safe because
memory_failure() runs in process context and no spinlocks are held,
explicitly noted in a mm/memory-failure.c comment.
In the Linux kernel, the following vulnerability has been resolved:
smb: Log an error when close_all_cached_dirs fails
Under low-memory conditions, close_all_cached_dirs() can't move the
dentries to a separate list to dput() them once the locks are dropped.
This will result in a "Dentry still in use" error, so add an error
message that makes it clear this is what happened:
[ 495.281119] CIFS: VFS: \\otters.example.com\share Out of memory while dropping dentries
[ 495.281595] ------------[ cut here ]------------
[ 495.281887] BUG: Dentry ffff888115531138{i=78,n=/} still in use (2) [unmount of cifs cifs]
[ 495.282391] WARNING: CPU: 1 PID: 2329 at fs/dcache.c:1536 umount_check+0xc8/0xf0
Also, bail out of looping through all tcons as soon as a single
allocation fails, since we're already in trouble, and kmalloc() attempts
for subseqeuent tcons are likely to fail just like the first one did.
In the Linux kernel, the following vulnerability has been resolved:
aoe: clean device rq_list in aoedev_downdev()
An aoe device's rq_list contains accepted block requests that are
waiting to be transmitted to the aoe target. This queue was added as
part of the conversion to blk_mq. However, the queue was not cleaned out
when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
indefinitely waiting for those requests to complete, causing a hang. This
fix cleans out the queue before calling blk_mq_freeze_queue().
In the Linux kernel, the following vulnerability has been resolved:
fbdev: core: fbcvt: avoid division by 0 in fb_cvt_hperiod()
In fb_find_mode_cvt(), iff mode->refresh somehow happens to be 0x80000000,
cvt.f_refresh will become 0 when multiplying it by 2 due to overflow. It's
then passed to fb_cvt_hperiod(), where it's used as a divider -- division
by 0 will result in kernel oops. Add a sanity check for cvt.f_refresh to
avoid such overflow...
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
In the Linux kernel, the following vulnerability has been resolved:
bus: fsl-mc: fix double-free on mc_dev
The blamed commit tried to simplify how the deallocations are done but,
in the process, introduced a double-free on the mc_dev variable.
In case the MC device is a DPRC, a new mc_bus is allocated and the
mc_dev variable is just a reference to one of its fields. In this
circumstance, on the error path only the mc_bus should be freed.
This commit introduces back the following checkpatch warning which is a
false-positive.
WARNING: kfree(NULL) is safe and this check is probably not required
+ if (mc_bus)
+ kfree(mc_bus);
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/pp: Fix potential NULL pointer dereference in atomctrl_initialize_mc_reg_table
The function atomctrl_initialize_mc_reg_table() and
atomctrl_initialize_mc_reg_table_v2_2() does not check the return
value of smu_atom_get_data_table(). If smu_atom_get_data_table()
fails to retrieve vram_info, it returns NULL which is later
dereferenced.
In the Linux kernel, the following vulnerability has been resolved:
ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()
There is no disagreement that we should check both ptp->is_virtual_clock
and ptp->n_vclocks to check if the ptp virtual clock is in use.
However, when we acquire ptp->n_vclocks_mux to read ptp->n_vclocks in
ptp_vclock_in_use(), we observe a recursive lock in the call trace
starting from n_vclocks_store().
============================================
WARNING: possible recursive locking detected
6.15.0-rc6 #1 Not tainted
--------------------------------------------
syz.0.1540/13807 is trying to acquire lock:
ffff888035a24868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
ptp_vclock_in_use drivers/ptp/ptp_private.h:103 [inline]
ffff888035a24868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
ptp_clock_unregister+0x21/0x250 drivers/ptp/ptp_clock.c:415
but task is already holding lock:
ffff888030704868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
n_vclocks_store+0xf1/0x6d0 drivers/ptp/ptp_sysfs.c:215
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ptp->n_vclocks_mux);
lock(&ptp->n_vclocks_mux);
*** DEADLOCK ***
....
============================================
The best way to solve this is to remove the logic that checks
ptp->n_vclocks in ptp_vclock_in_use().
The reason why this is appropriate is that any path that uses
ptp->n_vclocks must unconditionally check if ptp->n_vclocks is greater
than 0 before unregistering vclocks, and all functions are already
written this way. And in the function that uses ptp->n_vclocks, we
already get ptp->n_vclocks_mux before unregistering vclocks.
Therefore, we need to remove the redundant check for ptp->n_vclocks in
ptp_vclock_in_use() to prevent recursive locking.