Jiri Slaby: static checking; COMPILE_TEST

August 4, 2013

Participants: Alasdair G Kergon, Alexey Khoroshi, Ben Hutchings, Dan Carpenter, H. Peter Anvin, James Bottomley, Jiri Slaby, Julia Lawall, Kees Cook, Tony Luck, Mark Brown, Paul Gortmaker, Steven Rostedt, Theodore Ts'o, Wolfram Sang, Alexey Khoroshilov, Fengguang Wu, Trond Myklebust, Jiri Kosina, Shuah Khan, Guenter Roeck, Lai Jiangshan, Masami Hiramatsu, Nicholas Bellinger, Linus Walleij.

People tagged: Fengguang Wu

Jiri Slaby asks why kernel developers do not make heavier use of static analyzers and how to better support builds for code intended for the more esoteric architectures. Steven Rostedt noted that tools tend not to be used heavily unless maintainers enforce their use (by running them themselves and rejecting patches that result in warnings). Dan Carpenter reported that Fengguang Wu automatically runs sparse and Coccinelle on each commit of a number of maintainers' trees. Dan also noted the difficulties with false positives, as did several others. Wolfram Sang runs these tools on his own code, and says that after some time he learns what shows up and why. He also uses these tools to assist with his review duties.

There was some discussion of having a rebasing for-checker branch, with most people arguing that Fengguang's checking happens pretty quickly and that slowness could be addressed by adding hardware. Julia Lawall said that Fengguang Wu keeps his false-positive rate low by taking only those coccinelle rules labeled as high confidence. Alasdair G Kergon wondered how long it took Fengguang's 0day checker to run, and asked about opting in to “found nothing” emails or publication of logs of commits tested. Several people stated that Fengguang's testing was pretty quick. [ Ed.: I know! We should benchmark it by adding broken commits! As if we don't do that already... ]

Jiri Slaby noted that while Fengguang's checks might well be fast, he would also like to support more comprehensive checkers having longer run times. But his experience is that non-timely error reports are often ignored. Julia Lawall suggested sending reminders when later commits appeared for that same file. Dan Carpenter suggested that the reports might be being ignored due to being sent to the wrong people, with Dan suggesting the commit author being more important than the top-level maintainer. Steven Rostedt suggested including both. Jiri replied that he believed that he was sending his error reports to the right set of people. Dan countered that the reports were handled correctly, enumerating the response to one of the emails. Mark Brown asked what other checkers should be used. Jiri Slaby listed LDV-tools, coverity, and stanse.

Fengguang Wu summarized his 0day kernel testing framework, which is currently run on each commit for more than 300 kernel git trees. This framework currently runs the following:

  1. Build tests covering more than 200 specific configurations from 30 architectures. [ Ed.: Imagine my surprise to learn that Linux really supports more than 30 architectures... ]
  2. Static validation including sparse, smatch, coccinelle, and checkpatch.pl.
  3. KVM boot tests based on randconfig builds.
  4. Runtime tests including trinity, CPU hotplug, xfstests, and many others.

The system does automatic bisection when it encounters an error. Errors are handled differently based on historical experience. High-confidence errors are emailed directly to authors, committers, and mailing lists, also CCing kbuild-all at 01.org for archival purposes. Low-confidence errors are sent to kbuild at 01.org for manual disposition.

Fenguang expects to add performance tests as well based on results from vmstat, iostat, nfsstat, lock_stat, perf, and so on while running a variety of test cases, configurations, and hardware.

Fengguang's post was met with general acclamation. Lai Jiangshan asked if future testing would include merging lockdep results from multiple boots, thus detecting lower-probability deadlock cycles that might otherwise escape notice, given that test times are short and the aggregate production run times are quite long. Nicholas A. Bellinger asked that driver- and target-specific tests also be included. [ Ed. Fengguang does run rcutorture on -rcu commits, so there is at least some capability towards this goal already in 0day. ] Nicholas also suggested that the Kernel Summit t-shirts include something like “All your kernel trees are belong to Fengguang bot.”