Skip to content
  • Alex Williamson's avatar
    vfio: Inhibit ballooning based on group attachment to a container · c65ee433
    Alex Williamson authored
    
    
    We use a VFIOContainer to associate an AddressSpace to one or more
    VFIOGroups.  The VFIOContainer represents the DMA context for that
    AdressSpace for those VFIOGroups and is synchronized to changes in
    that AddressSpace via a MemoryListener.  For IOMMU backed devices,
    maintaining the DMA context for a VFIOGroup generally involves
    pinning a host virtual address in order to create a stable host
    physical address and then mapping a translation from the associated
    guest physical address to that host physical address into the IOMMU.
    
    While the above maintains the VFIOContainer synchronized to the QEMU
    memory API of the VM, memory ballooning occurs outside of that API.
    Inflating the memory balloon (ie. cooperatively capturing pages from
    the guest for use by the host) simply uses MADV_DONTNEED to "zap"
    pages from QEMU's host virtual address space.  The page pinning and
    IOMMU mapping above remains in place, negating the host's ability to
    reuse the page, but the host virtual to host physical mapping of the
    page is invalidated outside of QEMU's memory API.
    
    When the balloon is later deflated, attempting to cooperatively
    return pages to the guest, the page is simply freed by the guest
    balloon driver, allowing it to be used in the guest and incurring a
    page fault when that occurs.  The page fault maps a new host physical
    page backing the existing host virtual address, meanwhile the
    VFIOContainer still maintains the translation to the original host
    physical address.  At this point the guest vCPU and any assigned
    devices will map different host physical addresses to the same guest
    physical address.  Badness.
    
    The IOMMU typically does not have page level granularity with which
    it can track this mapping without also incurring inefficiencies in
    using page size mappings throughout.  MMU notifiers in the host
    kernel also provide indicators for invalidating the mapping on
    balloon inflation, not for updating the mapping when the balloon is
    deflated.  For these reasons we assume a default behavior that the
    mapping of each VFIOGroup into the VFIOContainer is incompatible
    with memory ballooning and increment the balloon inhibitor to match
    the attached VFIOGroups.
    
    Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
    Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
    c65ee433