In the Linux kernel, the following vulnerability has been resolved:
cipso: Fix data-races around sysctl.
While reading cipso sysctl variables, they can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.
In the Linux kernel, the following vulnerability has been resolved:
cgroup: Use separate src/dst nodes when preloading css_sets for migration
Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.
Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:
#1> mkdir -p /sys/fs/cgroup/misc/a/b
#2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
#3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
#4> PID=$!
#5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
#6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs
the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.
After #3, let's say the whole process was in cset A, and that after #4, the
leader moves to cset B. Then, during #6, the following happens:
1. cgroup_migrate_add_src() is called on B for the leader.
2. cgroup_migrate_add_src() is called on A for the other threads.
3. cgroup_migrate_prepare_dst() is called. It scans the src list.
4. It notices that B wants to migrate to A, so it tries to A to the dst
list but realizes that its ->mg_preload_node is already busy.
5. and then it notices A wants to migrate to A as it's an identity
migration, it culls it by list_del_init()'ing its ->mg_preload_node and
putting references accordingly.
6. The rest of migration takes place with B on the src list but nothing on
the dst list.
This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.
This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.
This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.
In the Linux kernel, the following vulnerability has been resolved:
icmp: Fix a data-race around sysctl_icmp_errors_use_inbound_ifaddr.
While reading sysctl_icmp_errors_use_inbound_ifaddr, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
In the Linux kernel, the following vulnerability has been resolved:
pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux()
pdesc could be null but still dereference pdesc->name and it will lead to
a null pointer access. So we move a null check before dereference.
In the Linux kernel, the following vulnerability has been resolved:
net: sfp: fix memory leak in sfp_probe()
sfp_probe() allocates a memory chunk from sfp with sfp_alloc(). When
devm_add_action() fails, sfp is not freed, which leads to a memory leak.
We should use devm_add_action_or_reset() instead of devm_add_action().
In the Linux kernel, the following vulnerability has been resolved:
net: tipc: fix possible refcount leak in tipc_sk_create()
Free sk in case tipc_sk_insert() fails.
In the Linux kernel, the following vulnerability has been resolved:
cpufreq: pmac32-cpufreq: Fix refcount leak bug
In pmac_cpufreq_init_MacRISC3(), we need to add corresponding
of_node_put() for the three node pointers whose refcount have
been incremented by of_find_node_by_name().
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: avoid skb access on nf_stolen
When verdict is NF_STOLEN, the skb might have been freed.
When tracing is enabled, this can result in a use-after-free:
1. access to skb->nf_trace
2. access to skb->mark
3. computation of trace id
4. dump of packet payload
To avoid 1, keep a cached copy of skb->nf_trace in the
trace state struct.
Refresh this copy whenever verdict is != STOLEN.
Avoid 2 by skipping skb->mark access if verdict is STOLEN.
3 is avoided by precomputing the trace id.
Only dump the packet when verdict is not "STOLEN".
In the Linux kernel, the following vulnerability has been resolved:
powerpc/xive/spapr: correct bitmap allocation size
kasan detects access beyond the end of the xibm->bitmap allocation:
BUG: KASAN: slab-out-of-bounds in _find_first_zero_bit+0x40/0x140
Read of size 8 at addr c00000001d1d0118 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc2-00001-g90df023b36dd #28
Call Trace:
[c00000001d98f770] [c0000000012baab8] dump_stack_lvl+0xac/0x108 (unreliable)
[c00000001d98f7b0] [c00000000068faac] print_report+0x37c/0x710
[c00000001d98f880] [c0000000006902c0] kasan_report+0x110/0x354
[c00000001d98f950] [c000000000692324] __asan_load8+0xa4/0xe0
[c00000001d98f970] [c0000000011c6ed0] _find_first_zero_bit+0x40/0x140
[c00000001d98f9b0] [c0000000000dbfbc] xive_spapr_get_ipi+0xcc/0x260
[c00000001d98fa70] [c0000000000d6d28] xive_setup_cpu_ipi+0x1e8/0x450
[c00000001d98fb30] [c000000004032a20] pSeries_smp_probe+0x5c/0x118
[c00000001d98fb60] [c000000004018b44] smp_prepare_cpus+0x944/0x9ac
[c00000001d98fc90] [c000000004009f9c] kernel_init_freeable+0x2d4/0x640
[c00000001d98fd90] [c0000000000131e8] kernel_init+0x28/0x1d0
[c00000001d98fe10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64
Allocated by task 0:
kasan_save_stack+0x34/0x70
__kasan_kmalloc+0xb4/0xf0
__kmalloc+0x268/0x540
xive_spapr_init+0x4d0/0x77c
pseries_init_irq+0x40/0x27c
init_IRQ+0x44/0x84
start_kernel+0x2a4/0x538
start_here_common+0x1c/0x20
The buggy address belongs to the object at c00000001d1d0118
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes inside of
8-byte region [c00000001d1d0118, c00000001d1d0120)
The buggy address belongs to the physical page:
page:c00c000000074740 refcount:1 mapcount:0 mapping:0000000000000000 index:0xc00000001d1d0558 pfn:0x1d1d
flags: 0x7ffff000000200(slab|node=0|zone=0|lastcpupid=0x7ffff)
raw: 007ffff000000200 c00000001d0003c8 c00000001d0003c8 c00000001d010480
raw: c00000001d1d0558 0000000001e1000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
c00000001d1d0000: fc 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c00000001d1d0080: fc fc 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
>c00000001d1d0100: fc fc fc 02 fc fc fc fc fc fc fc fc fc fc fc fc
^
c00000001d1d0180: fc fc fc fc 04 fc fc fc fc fc fc fc fc fc fc fc
c00000001d1d0200: fc fc fc fc fc 04 fc fc fc fc fc fc fc fc fc fc
This happens because the allocation uses the wrong unit (bits) when it
should pass (BITS_TO_LONGS(count) * sizeof(long)) or equivalent. With small
numbers of bits, the allocated object can be smaller than sizeof(long),
which results in invalid accesses.
Use bitmap_zalloc() to allocate and initialize the irq bitmap, paired with
bitmap_free() for consistency.
In the Linux kernel, the following vulnerability has been resolved:
perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
Yang Jihing reported a race between perf_event_set_output() and
perf_mmap_close():
CPU1 CPU2
perf_mmap_close(e2)
if (atomic_dec_and_test(&e2->rb->mmap_count)) // 1 - > 0
detach_rest = true
ioctl(e1, IOC_SET_OUTPUT, e2)
perf_event_set_output(e1, e2)
...
list_for_each_entry_rcu(e, &e2->rb->event_list, rb_entry)
ring_buffer_attach(e, NULL);
// e1 isn't yet added and
// therefore not detached
ring_buffer_attach(e1, e2->rb)
list_add_rcu(&e1->rb_entry,
&e2->rb->event_list)
After this; e1 is attached to an unmapped rb and a subsequent
perf_mmap() will loop forever more:
again:
mutex_lock(&e->mmap_mutex);
if (event->rb) {
...
if (!atomic_inc_not_zero(&e->rb->mmap_count)) {
...
mutex_unlock(&e->mmap_mutex);
goto again;
}
}
The loop in perf_mmap_close() holds e2->mmap_mutex, while the attach
in perf_event_set_output() holds e1->mmap_mutex. As such there is no
serialization to avoid this race.
Change perf_event_set_output() to take both e1->mmap_mutex and
e2->mmap_mutex to alleviate that problem. Additionally, have the loop
in perf_mmap() detach the rb directly, this avoids having to wait for
the concurrent perf_mmap_close() to get around to doing it to make
progress.