Michel Lespinasse: insane cgroups, unfair rwlocks (scalability)

August 3, 2013

Participants: Michel Lespinasse, Lai Jiangshan, Li Zefan, Stephen Hemminger, Herbert Xu, Ben Hutchings, Matthew Wilcox.

People tagged: Srivatsa S. Bhat

Michel Lespinasse suggested discussion of unfair rwlocks, especially with respect to the tasklist_lock. Lai Jiangshan noted the tradeoff between performance and fairness, but would like to see a solution that combines acceptable performance with acceptable fairness. Li Zefan put forward ticket locks as an example lock combining acceptable fairness and performance. Michel pointed out that the main issue with reader-writer locks occurs when read-side re-entrancy is required, and that although resem allows writers to steal the lock from non-running pending writers, there is nevertheless fairness between readers and writers. Lai Jiangshan disputed this, arguing that writer stealing could starve readers. Stephen Hemminger indicated interest in improving all types of locks, and called for prototypes of alternative implementations, and furthermore argued for replacing reader-writer locking with RCU. [ Ed.: And how could I argue with that? ;-) ]

Lai Jiangshan followed up on Stephen's advice with a new thread on RCU and concurrent data structures in the kernel by asking how RCU-protected concurrent data structures from userspace RCU might be pulled into the kernel. He also suggested that an RCU-protected RBtree might be helpful for scalable address spaces. (Similar suggestions have been made in academia, with more work ongoing.) Mathieu Desnoyers indicated his interest in what other scalability issues exist in the kernel, in how to keep common code between the kernel and userspace RCU in sync given licensing issues [ Ed.: Dual licensing seems appropriate here ], and in validation issues.

Herbert Xu has been viewing the increasing kernel latency for typical networking applications with some alarm, and would like to talk about ways of fixing this. Ben Hutchings indicated interest, but asked if it was specific to networking. Herbert replied that the scheduler, power management, and tracing were contributing to latency, not just networking. There was some discussion of whether this was something that sysadmins could deal with via configuration or whether kernel code changes were required. Matthew Wilcox asked if applications desiring low latency were willing to spend significant CPU to obtain it, for example, by substituting busy-wait polling for wakeups. Matthew also noted that this fits in well with the low-latency-devices topic.