July 29, 2015
Additional Participants: Chris Mason, Christoph Lameter, Lai Jiangshan, and Peter Zijlstra.
People tagged: Jens Axboe, Jon Corbet, Mathieu Desnoyers, Paul E. McKenney, and Shaohua Li.
suggests a discussion of a light-weight mechanism permitting user-mode
code to implement per-CPU operations, calling out Paul Turner's patch,
Mathieu Desnoyers's patch, and his own approach of using
said that his group has started experimenting with these patches and hopes
to have performance data from production workloads soonish, which
applauded, and suggested might also be applied in-kernel.
replied that in-kernel experimentation need not wait on an API,
and argued that in-kernel use could rely on interrupt hooks instead
of scheduler hooks.
However, Peter suspects that forcing function calls for these operations
will eat up much of the potential performance gains.
Finally, Peter believes that
%gs prefixes will have
substantial performance advantages.
responded that one could avoid function-call overhead by moving the
calling function into the special code region and that some of the
%gs approaches might avoid the implicit memory barriers
that degrade performance of read-modify-write instructions on x86.
agreed that read-modify-write instructions can be slow, but that
cmpxchg is pretty fast.
Andy also suggested per-CPU memory mappings as a self-described crazy idea.
liked the per-CPU memory mappings, noting that this had been done on
Itanium, but that x86 would require a separate page table for each
CPU for each task.
Lai Jiangshan called out anohter disadvantage of a special code region, namely that all functions in that region must avoid invoking functions outside that region, however, he agrees that doing this simplifies scheduler hooks. Lai also notes that in-kernel application of these techniques could simplify NMI handlers.