IOMMU Usage
OS Usages of DMA Remapping
- OS Protection: An OS may define a domain containing its critical code and data structures, and restrict access to this domain from all I/O devices in the system. This allows the OS to limit erroneous or unintended corruption of its data and code through incorrect programming of devices by device drivers, thereby improving OS robustness and reliability.
- Feature Support: An OS may use domains to better manage DMA from legacy devices to high memory (For example, 32-bit PCI devices accessing memory above 4GB). This is achieved by programming the I/O page-tables to remap DMA from these devices to high memory. Without such support, software must resort to data copying through OS “bounce buffers”.
- DMA Isolation: An OS may manage I/O by creating multiple domains and assigning one or more I/O devices to each domain. Each device-driver explicitly registers its I/O buffers with the OS, and the OS assigns these I/O buffers to specific domains, using hardware to enforce DMA domain protection.
- Shared Virtual Memory: For devices supporting appropriate PCI-Express[1] capabilities, OS may use the DMA remapping hardware capabilities to share virtual address space of application processes with I/O devices.
[1] Refer to Process Address Space ID (PASID) capability in PCI-Express* base specification.
VMM Usages of DMA Remapping
- The limitations of software-only methods for I/O virtualization can be improved through direct assignment of I/O devices to partitions. With this approach, the driver for an assigned I/O device runs only in the partition to which it is assigned and is allowed to interact directly with the device hardware with minimal or no VMM involvement. The hardware support for DMA remapping enables this direct device assignment without device-specific knowledge in the VMM.
DMA Remapping Usages by Guests
- A guest OS running in a VM may benefit from the availability of remapping hardware to support the usages described in OS Usages of DMA Remapping. To support such usages, the VMM may virtualize the remapping hardware to its guests. For example, the VMM may intercept guest accesses to the virtual remapping hardware registers, and manage a shadow copy of the guest remapping structures that is provided to the physical remapping hardware. On updates to the guest I/O page tables, the guest software performs appropriate virtual invalidation operations. The virtual invalidation requests may be intercepted by the VMM, to update the respective shadow page tables and perform invalidations of remapping hardware. Due to the non-restartability of faulting DMA transactions (unlike CPU memory management virtualization), a VMM cannot perform lazy updates to its shadow remapping structures. To keep the shadow structures consistent with the guest structures, the VMM may expose virtual remapping hardware with eager pre-fetching behavior (including caching of not-present entries) or use processor memory management mechanisms to write-protect the guest remapping structures.
- On hardware implementations supporting two levels of address translations (first-level translation to remap a virtual address to intermediate (guest) physical address, and second-level translations to remap a intermediate physical address to machine (host) physical address), a VMM may virtualize guest OS use of first-level translations (such as for Shared Virtual Memory usages) without shadowing page-tables, but by configuring hardware to perform nested translation of first and second-levels.
Nested Translation
- Extended-context-entries can be configured to translate requests-with-PASID through first-level translation. Extended-context-entries contain the PASID-table pointer and size fields used to reference the PASID-table. The PASID-number in a request-with-PASID is used to offset into the PASID-table. Each present PASID-entry contains a pointer to the base of the first-level translation structure for the respective process address space.
- When Nesting Enable (NESTE) field is 1 in extended-context-entries, requests-with-PASID translated through first-level translation are also subjected to nested second-level translation. Such extendedcontext-entries contain both the pointer to the PASID-table (which contains the pointer to the firstlevel translation structures), and the pointer to the second-level translation structures. [VT-d spec page-44] illustrates the nested translation for a request-with-PASID mapped to a 4-KByte page through firstlevel translation, and interleaved through 4-KByte mappings in second-level paging structures.
- With nesting, all memory accesses generated when processing a request-with-PASID through first- level translation are subjected to second-level translation. This includes access to PASID-table entry, access to first-level paging structure entries (PML4E, PDPE, PDE, PTE), and access to the output address from first-level translation. With nested translation, a guest operating system running within a virtual machine may utilize first-level translation as described in DMA Remapping Usages by Guests, while the virtual machine monitor may virtualize memory by enabling nested second-level translations.
- PASIDE: PASID Enable in Extended-Context-Entry: This field is treated as Reserved(0) for implementations not supporting PASID (PASID=0 in Extended Capability Register).
- 0: Requests with PASID are blocked.
- 1: Requests with PASID are processed per programming of Translation Type (T) and Nested Translation Enable (NESTE) fields.
- NESTE: Nested Translation Enable in Extended-Context-Entry: this field is treated as Reserved(0) for implementations not supporting Nested Translations (NEST=0 in Extended Capability Register). This field is ignored when PASID Enable (PASIDE) is Clear.
- 0: Requests remapped through PASID table (referenced through PASIDPTPTR field) are subject to first-level translation only. First- level page-tables are referenced through PML4PTR field in the PASID-table.
- 1: Requests remapped through PASID table referenced through PASIDPTPTR field) are subject to nested first-level and second- level translation. first-level page-tables are referenced through PML4PTR field in the PASID-table, and second-level page-tables are referenced through SLPTPTR field in the extended-context- entry.