May 12, 2014
Participants: Bjorn Helgaas, Daniel Vetter, James Bottomley, Joerg Roedel, Josh Triplett, Laurent Pinchart, Matthew Wilcox, Rafael J. Wysocki, Roland Dreier, Tony Luck, and Will Deacon.
People tagged: Joerg Roedel, someone with intimate knowledge of EEH as used on Power systems, and KVM folks.
David Woodhouse raised the topic of device-driver errors, particularly in the presence of IOMMUs, and even more particularly in cases where a device is emitting an endless stream of errors that prevents the kernel from getting anything else done. It turns out that there are a number of ways to shut up such a device, including PCI function-level reset, power cycling the device, or perhaps configuring the IOMMU to ignore further errors from that device. Of course, the possibility of ignoring errors raises the question of when they should be re-enabled. David would like these decisions to be made in generic device code, not inconsistently in each IOMMU driver, whether for PCI or for non-PCI. Bjorn Helgaas, Rafael J. Wysocki, Joerg Roedel, and James Bottomley all expressed interest, with James indicating a further interest in avoiding an Intel-IOMMU-centric solution. James also asked if the errors in question were due to the device sending addresses that don't have IOTLB entries. David confirmed this case, adding attempts to write through read-only mappings. However, David believes that other errors will come to light as well, and that Intel IOMMUs have properties similar to those of other vendors.
Laurent Pinchart
suggests that one of the other classes of errors will prove to be
attempting to perform secure accesses on non-secure IOTLB entries.
Laurent also suggested partitioning the problem into (1) identifying the
offending device and (2) identifying a mechanism to handle the errors.
Laurent doubts that #1 can be completely generic, and believes that #2
will require both generic and driver-specific code.
Will Deacon
pointed out that some non-PCI devices lack a specified way of making sure
that a newly-killed device does not still have transactions in flight.
In addition, ignoring fault reports can result in queue overflows in
some implementations.
Josh Triplett
is interested in using IOMMUs to protect against buggy and even against
malicious devices.
This is particularly important for devices (like some laptops) that
allow external PCIe devices to be plugged in.
David Woodhouse
notes that this use case is what prompts the current implementation
to give devices with no driver zero privileges and to give devices
with a driver carefully whitelisted privileges.
Roland Dreier
argues that no special action should be required if there is no device
driver, because no bits get turned on in the PCI command register until
pci_enable_device()
time.
Roland also notes that his wifi adapter can already sniff and modify
all his network traffic, so there is a limit to what IOMMU-level protection
can accomplish, and wishes that VT-d was in better shape so that distros
might enable it by default.
Tony Luck
likes the idea of defending against buggy hardware from a RAS perspective.
James Bottomley
wonders what exactly needs to be done for RAS beyond having the IOMMU
corral the device.
Joerg Roedel
wants proper fault handling, even on laptops and desktops, arguing that this
will be needed for newer GPUs.
Laurent Pinchart
wants a mechanism to correctly report and handle the IOMMU faults in
order to prevent interrupt storms from causing DoS.
Daniel Vetter
has considerable experience with these sorts of interrupt storms, in fact,
they cause so much trouble that Daniel disables IOMMUs on his development
systems, which in turn causes regressions, reinforcing distro's decisions
not to enable IOMMUs by default.
Daniel therefore would like to see IOMMU interrupt-storm handling as a
first step towards making IOMMU enablement safe on both development
and production systems.
Joerg Roedel
agrees that the developer use case must be taken into account, but
believes that there needs to be some way of re-enabling a device that
was previously ignored due to interrupt storms.
Daniel
suggests that a disable/enable cycle of the PCI bus master should be
a sufficient signal, but also suggests that simply re-enabling the
IOMMU whenever any child device is re-enabled would suffice.
In the latter case, if the interrupt storm resumed, the storm handling
would simply kick in once again.
Roland Dreier notes that there are many other PCI errors besides IOMMU faults, and wonders if this other error handling can also be consolidated. Roland is concerned about NVMe devices, which are PCIe-connected devices that might be put into hot-pluggable JBODs, at which point the fact that the kernel reacts less well to PCIe hotplug than to (say) SAS hotplug becomes apparent. Matthew Wilcox has been hearing rumors about NVMe hotplug, but hasn't seen bug reports. Matthew therefore requested that people put up or shut up on this topic. Roland replied that he was not trying to spread FUD, and that in any case the issues he is seeing are PCIe configuration problems rather than bugs in the NVMe driver itself.