Proxmox PCIe passthrough on HP gen8 - failed to set iommu for container
Problem
Setting up PCIe passthrough from host to a VM was supposed to be easy. However, being an HP server, there was a bit more to it than usual. The VM simply refused to start when configured use Nvidia GPU from the host:
vfio error: 0000:04:00.0: failed to setup container for group 21: failed to set iommu for container: Operation not permitted
In dmesg there was a bit more background on what was wrong:
fio-pci 0000:04:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
Luckily, HP had issued a customer advisory on this. It describes a convoluted method to disable this RMRR per slot basis. It seems to work for me, so I thought I'd write down some notes if I ever run into this again.
Basic setup
Proxmox has decent instructions for preparing the host for passthrough setup in general, in summary:
- add intel_iommu=on to GRUB_CMDLINE_LINUX_DEFAULT in the file /etc/default/grub
- add vfio modules to /etc/modules
vfio vfio_iommu_type1 vfio_pci vfio_virqfd
- update-initramfs -u -k all
- update-grub
To enable the passthrough for the VM, it was a good idea to use the command line tool
qm set VMID -hostpci0 00:04
I couldn't figure out how to do this in the GUI. Leaving out the interface number passes through both interfaces of this slot. One being the VGA device, other being the audio device.
Resolving the problem
The HP advisory links to the tools are already outdated by 2020, but the packages you need are and their repositories are:
Scripting toolkit - version 11.40 worked for me
hp-health from management component pack - version 10.80 worked for me
spci -vmm|grep -B5 PhySlot
cat exclude.dat
<Conrep> <Section name="RMRDS_Slot1" helptext=".">Endpoints_Excluded</Section> </Conrep>
Be sure to understand the warnings in the advisory and then apply the modification:
conrep -l -x conrep_rmrds.xml -f exclude.dat
Optionally run in to some locale errors with
ERROR: locale::facet::_S_create_c_locale name not valid
Resolve them with
export LC_CTYPE=en_US.UTF-8
Enjoy the message of great success and reboot
conrep 5.5.0.0 - HPE Scripting Toolkit Configuration Replication Program
(c) Copyright 2013,2017 Hewlett Packard Enterprise Development LP
System Type: ProLiant ML350p Gen8
ROM Date : 07/01/2015
ROM Family : P72
Processor Manufacturer : Intel
XML System Configuration: conrep_rmrds.xml
Hardware Configuration: exclude.dat
Global Restriction: [3.40 ] OK
Loading configuration data from exclude.dat
Conrep Return Code: 0
Thanks for the blog post! Helped fixing my DL380e Gen8 HP Server while trying to passthrough a LSI SAS2008 PCI Controller
ReplyDeleteWill this work on a dl380 G7 as well? I'm unable to install hp-health due to the following error.
ReplyDeleteThe following packages have unmet dependencies:
hp-health : Depends: libc6-i686 but it is not installable or
lib32gcc1 but it is not installable
Went through the rest of the process and got Conrep Return Code: 0 but gpu passthrough still does not work.