Proxmox PCIe passthrough on HP gen8 - failed to set iommu for container

Problem

Setting up PCIe passthrough from host to a VM was supposed to be easy. However, being an HP server, there was a bit more to it than usual. The VM simply refused to start when configured use Nvidia GPU from the host:

vfio error: 0000:04:00.0: failed to setup container for group 21: failed to set iommu for container: Operation not permitted

In dmesg there was a bit more background on what was wrong:

fio-pci 0000:04:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.

Luckily, HP had issued a customer advisory on this. It describes a convoluted method to disable this RMRR per slot basis. It seems to work for me, so I thought I'd write down some notes if I ever run into this again.

Basic setup

Proxmox has decent instructions for preparing the host for passthrough setup in general, in summary:

add intel_iommu=on to GRUB_CMDLINE_LINUX_DEFAULT in the file /etc/default/grub

- add vfio modules to /etc/modules

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

update-initramfs -u -k all

- update-grub

To enable the passthrough for the VM, it was a good idea to use the command line tool

qm set VMID -hostpci0 00:04

I couldn't figure out how to do this in the GUI. Leaving out the interface number passes through both interfaces of this slot. One being the VGA device, other being the audio device.

Resolving the problem

The HP advisory links to the tools are already outdated by 2020, but the packages you need are and their repositories are:

Scripting toolkit - version 11.40 worked for me

hp-health from management component pack - version 10.80 worked for me


I guess you could follow the apt setup instructions there. I simply did wget on the specific .deb's above. Ran dpkg -i on them and an apt-get install -f afterwards for the dependencies.

You need to identify the PCIe slot of the card you wish to pass through using

spci -vmm|grep -B5 PhySlot


Then, as described, download the template

Prepare the file describing the change. My card was in slot 1:

cat exclude.dat 

<Conrep> <Section name="RMRDS_Slot1" helptext=".">Endpoints_Excluded</Section> </Conrep>


Be sure to understand the warnings in the advisory and then apply the modification:


conrep -l -x conrep_rmrds.xml -f exclude.dat 



Optionally run in to some locale errors with 

ERROR: locale::facet::_S_create_c_locale name not valid


Resolve them with  

export LC_CTYPE=en_US.UTF-8



Enjoy the message of great success and reboot 

conrep 5.5.0.0 - HPE Scripting Toolkit Configuration Replication Program

(c) Copyright 2013,2017 Hewlett Packard Enterprise Development LP


System Type: ProLiant ML350p Gen8

ROM Date   : 07/01/2015

ROM Family : P72

Processor Manufacturer : Intel


XML System Configuration: conrep_rmrds.xml

Hardware   Configuration: exclude.dat

Global Restriction: [3.40                            ]                  OK


Loading configuration data from exclude.dat


Conrep Return Code: 0


Comments

  1. Thanks for the blog post! Helped fixing my DL380e Gen8 HP Server while trying to passthrough a LSI SAS2008 PCI Controller

    ReplyDelete
  2. Will this work on a dl380 G7 as well? I'm unable to install hp-health due to the following error.

    The following packages have unmet dependencies:
    hp-health : Depends: libc6-i686 but it is not installable or
    lib32gcc1 but it is not installable

    Went through the rest of the process and got Conrep Return Code: 0 but gpu passthrough still does not work.

    ReplyDelete

Post a Comment

Popular posts from this blog

iMovie event library on a network drive, NAS

Backup and restore observium