June 9, 2018
ESXi/ESX 4.1 and later versions introduced interrupt remapping code that is enabled by default. This technology was introduced by Intel to produce more efficient IRQ routing to improve the performance and security of VMs. The Interrupt-Remapping feature enables the VMM to isolate interrupts to CPUs assigned to a given VM and to remap/reroute physical I/O device interrupts. When enabled, this feature helps ensure an efficient migration of the interrupts across CPUs.
The interrupt remapper is controlled by the VMware kernel setting - iovDisableIR.
To show the current setting, use the following ESXCLI command:
esxcli system settings kernel list -o iovDisableIR
To enable it:
esxcli system settings kernel set --setting=iovDisableIR -v FALSE
To disable it:
esxcli system settings kernel set --setting=iovDisableIR -v TRUE
Over the years, there have been several problems related to this setting. Provided below are multiple related Knowledge Base articles:
vHBAs and other PCI devices may stop responding in ESXi 6.0.x, ESXi 5.x and ESXi/ESX 4.1 when using Interrupt Remapping (1030265)
VMware advised that the remapper should be disabled (set true to the iovDisableIR parameter)
A couple of years ago, during my work as a system engineer, we always disabled it on ESXi servers before putting them into production.
Onto the next article:
This article has reconfirmed that the VT-d interrupt remapper should be disabled for multiple Intel processors:
Intel Xeon Processor 55xx Series
Intel Xeon Processor 56xx Series
Intel Xeon Processor 65xx Series
Intel Xeon Processor 75xx Series
Intel Xeon Processor E5-1400 v2 Product Family
Intel Xeon Processor E5-1600 v2 Product Family
Intel Xeon Processor E5-1600 v3 Product Family
Intel Xeon Processor E5-2400 Product Family
Intel Xeon Processor E5-2400 v2 Product Family
Intel Xeon Processor E5-2600 Product Family
Intel Xeon Processor E5-2600 v2 Product Family
Intel Xeon Processor E5-2600 v3 Product Family
Intel Xeon Processor E5-2600 v4 Product Family
Intel Xeon Processor E5-4600 Product Family
Intel Xeon Processor E5-4600 v2 Product Family
Intel Xeon Processor E5-4600 v3 Product Family
Intel Xeon Processor E5-4600 v4 Product Family
Intel Xeon Processor E7-2800 Product Family
Intel Xeon Processor E7-4800 Product Family
Intel Xeon Processor E7-8800 Product Family
Intel Xeon Processor E7-8800/4800/2800 v2 Product Families
Intel Xeon Processor E7-8800/4800 v3 Product Families
Intel Xeon Processor E7-8800/4800 v4 Product Families
VMware even set it as a default setting in versions:
But this, in turn, caused a new PSOD issue, affecting HP ProLiant Gen8 servers:
ESXi host fails with intermittent NMI PSOD on HP ProLiant Gen8 servers (2149043)
HPE also covered this in Custom Advisory c05392947.
ESXi IO connectivity issues or PSOD with VT-d interrupt remapper disabled (2149592)
In the following ESXi versions, VMware reinstated the interrupt remapping:
The interrupt remapping is enabled by default on:
As you can see, even a single parameter with different values can lead deeper problems.
Use Runecast Analyzer to verify if your specific ESXi hosts are affected by this parameter. It shows if your servers are vulnerable to a specific problem, and why:
At Runecast, we are constantly updating the automatic checks based on Knowledge Base articles, Best Practices and Security Hardening Guides. These updates combined with automatic monitoring ensure that your vSphere environment can be continuously protected using the latest industry knowledge.
Head of R&D