The current setup of the quarantine page tables assumes that the
quarantine domain (dom_io) has been initialized with an address width
of DEFAULT_DOMAIN_ADDRESS_WIDTH (48) and hence 4 page table levels.
However dom_io being a PV domain gets the AMD-Vi IOMMU page tables
levels based on the maximum (hot pluggable) RAM address, and hence on
systems with no RAM above the 512GB mark only 3 page-table levels are
configured in the IOMMU.
On systems without RAM above the 512GB boundary
amd_iommu_quarantine_init() will setup page tables for the scratch
page with 4 levels, while the IOMMU will be configured to use 3 levels
only, resulting in the last page table directory (PDE) effectively
becoming a page table entry (PTE), and hence a device in quarantine
mode gaining write access to the page destined to be a PDE.
Due to this page table level mismatch, the sink page the device gets
read/write access to is no longer cleared between device assignment,
possibly leading to data leaks.
The fixes for XSA-422 (Branch Type Confusion) and XSA-434 (Speculative
Return Stack Overflow) are not IRQ-safe. It was believed that the
mitigations always operated in contexts with IRQs disabled.
However, the original XSA-254 fix for Meltdown (XPTI) deliberately left
interrupts enabled on two entry paths; one unconditionally, and one
conditionally on whether XPTI was active.
As BTC/SRSO and Meltdown affect different CPU vendors, the mitigations
are not active together by default. Therefore, there is a race
condition whereby a malicious PV guest can bypass BTC/SRSO protections
and launch a BTC/SRSO attack against Xen.
When a transaction is committed, C Xenstored will first check
the quota is correct before attempting to commit any nodes. It would
be possible that accounting is temporarily negative if a node has
been removed outside of the transaction.
Unfortunately, some versions of C Xenstored are assuming that the
quota cannot be negative and are using assert() to confirm it. This
will lead to C Xenstored crash when tools are built without -DNDEBUG
(this is the default).
[This CNA information record relates to multiple CVEs; the
text explains which aspects/vulnerabilities correspond to which CVE.]
libfsimage contains parsing code for several filesystems, most of them based on
grub-legacy code. libfsimage is used by pygrub to inspect guest disks.
Pygrub runs as the same user as the toolstack (root in a priviledged domain).
At least one issue has been reported to the Xen Security Team that allows an
attacker to trigger a stack buffer overflow in libfsimage. After further
analisys the Xen Security Team is no longer confident in the suitability of
libfsimage when run against guest controlled input with super user priviledges.
In order to not affect current deployments that rely on pygrub patches are
provided in the resolution section of the advisory that allow running pygrub in
deprivileged mode.
CVE-2023-4949 refers to the original issue in the upstream grub
project ("An attacker with local access to a system (either through a
disk or external drive) can present a modified XFS partition to
grub-legacy in such a way to exploit a memory corruption in grub’s XFS
file system implementation.") CVE-2023-34325 refers specifically to
the vulnerabilities in Xen's copy of libfsimage, which is decended
from a very old version of grub.
The caching invalidation guidelines from the AMD-Vi specification (48882—Rev
3.07-PUB—Oct 2022) is incorrect on some hardware, as devices will malfunction
(see stale DMA mappings) if some fields of the DTE are updated but the IOMMU
TLB is not flushed.
Such stale DMA mappings can point to memory ranges not owned by the guest, thus
allowing access to unindented memory regions.
Cortex-A77 cores (r0p0 and r1p0) are affected by erratum 1508412
where software, under certain circumstances, could deadlock a core
due to the execution of either a load to device or non-cacheable memory,
and either a store exclusive or register read of the Physical
Address Register (PAR_EL1) in close proximity.
Oxenstored 32->31 bit integer truncation issues Integers in Ocaml are 63 or 31 bits of signed precision. The Ocaml Xenbus library takes a C uint32_t out of the ring and casts it directly to an Ocaml integer. In 64-bit Ocaml builds this is fine, but in 32-bit builds, it truncates off the most significant bit, and then creates unsigned/signed confusion in the remainder. This in turn can feed a negative value into logic not expecting a negative value, resulting in unexpected exceptions being thrown. The unexpected exception is not handled suitably, creating a busy-loop trying (and failing) to take the bad packet out of the xenstore ring.
IOMMU page mapping issues on x86 T[his CNA information record relates to multiple CVEs; the text explains which aspects/vulnerabilities correspond to which CVE.] Both AMD and Intel allow ACPI tables to specify regions of memory which should be left untranslated, which typically means these addresses should pass the translation phase unaltered. While these are typically device specific ACPI properties, they can also be specified to apply to a range of devices, or even all devices. On all systems with such regions Xen failed to prevent guests from undoing/replacing such mappings (CVE-2021-28694). On AMD systems, where a discontinuous range is specified by firmware, the supposedly-excluded middle range will also be identity-mapped (CVE-2021-28695). Further, on AMD systems, upon de-assigment of a physical device from a guest, the identity mappings would be left in place, allowing a guest continued access to ranges of memory which it shouldn't have access to anymore (CVE-2021-28696).
IOMMU page mapping issues on x86 T[his CNA information record relates to multiple CVEs; the text explains which aspects/vulnerabilities correspond to which CVE.] Both AMD and Intel allow ACPI tables to specify regions of memory which should be left untranslated, which typically means these addresses should pass the translation phase unaltered. While these are typically device specific ACPI properties, they can also be specified to apply to a range of devices, or even all devices. On all systems with such regions Xen failed to prevent guests from undoing/replacing such mappings (CVE-2021-28694). On AMD systems, where a discontinuous range is specified by firmware, the supposedly-excluded middle range will also be identity-mapped (CVE-2021-28695). Further, on AMD systems, upon de-assigment of a physical device from a guest, the identity mappings would be left in place, allowing a guest continued access to ranges of memory which it shouldn't have access to anymore (CVE-2021-28696).
IOMMU page mapping issues on x86 T[his CNA information record relates to multiple CVEs; the text explains which aspects/vulnerabilities correspond to which CVE.] Both AMD and Intel allow ACPI tables to specify regions of memory which should be left untranslated, which typically means these addresses should pass the translation phase unaltered. While these are typically device specific ACPI properties, they can also be specified to apply to a range of devices, or even all devices. On all systems with such regions Xen failed to prevent guests from undoing/replacing such mappings (CVE-2021-28694). On AMD systems, where a discontinuous range is specified by firmware, the supposedly-excluded middle range will also be identity-mapped (CVE-2021-28695). Further, on AMD systems, upon de-assigment of a physical device from a guest, the identity mappings would be left in place, allowing a guest continued access to ranges of memory which it shouldn't have access to anymore (CVE-2021-28696).