Laura Abbott: Stable trees and release time

September 18, 2018

Related Material:

  1. 2016 LKS “Stable Workflow” Core Topic.
  2. 2016 LKS “Issues with stable process” Core Topic
  3. stable-kernel-rules.rst.

Additional Participants: Benjamin Gilbert, Daniel Vetter, Dmitry Torokhov, Eduardo Valentin, Geert Uytterhoeven, Greg KH, Guenter Roeck, James Bottomley, Jan Kara, Jiri Kosina, Justin Forbes, Mark Brown, Sasha Levin, Steven Rostedt, Takashi Iwai, and Thomas Gleixner.

Introduction

Laura would like to see stable updates that release on a regular schedule with a longer -rc period in order to allow a predictable amount of testing to be carried out and to make predictable deliveries to Fedora customers. Justin Forbes agreed that cadence has recently become a problem, suspecting that certain embargoed security fixes might have been part of the problem, which Greg KH confirmed, mentioning “patch tuesday” and also pointing out that infrequent releases implies larger releases. Greg also called out the security benefits of an irregular release schedule. Mark Brown noted a resource issue with arm backporting due to people wanting very old kernels, but said “Watch this space”. Guenter Roeck noted that larger releases means more conflicts, but said that the current process works fine for him, but that he could also live with a weekly release schedule. Laura noted Greg's point about irregular release cycles, but added that most people interested in finding exploits are already looking closely. Justin agrees that life has been hard this year, and would be happy with a one-per-week rc schedule, and OK with an irregular schedule. Justin suggested multiple stable release candidates per week, with a weekly release, similar to mainline.

Sasha Levin suggested that distros follow whatever schedule suits them, just taking the most recent -stable whenever they are ready to move forward. Guenter Roeck agreed with Sasha's delay-equivalence logic, further noting that fewer larger releases means more regressions per release. Laura countered by reiterating that longer release cycles allow more testing so as to actually find the regressions. Sasha agreed, but asked how long was long enough, pointing out that regardless of how much testing is done, more testing would likely find more regressions. Sasha asked asked for numbers on escapes to Fedora's releases, which Laura agreed to try to gather. Guenter sees a regression rate of about 0.15% fo rstable releases, where a regression is a bug found post-release that had to be fixed later. Guenter noted that it would be interesting to know how many were found by testing and how many by users. Benjamin Gilbert said that five of 11 4.14-based CoreOS kernels had user-visible regressions, conveniently listing them all.

Jiri Kosina agreed with Laura, adding that SUSE has lots of internal discussions on how to adjust its processes to changes happening in the -stable tree and in its patch acceptance criteria and further that it is becoming increasingly apparent that the -stable tree is not really intended for distros. In fact, Jiri believes that most distros are running their own variation of the -stable tree, and wonders if this is inevitable and perhaps even desired. Greg doesn't recall ever saying that -stable was not for distros, calling out a number of distros that use it. Greg suggests that he instead said “not for an ‘enterprise’ distro”. Greg stated that he takes distro feedback seriously, at least aside from “you are taking too many patches”. Jiri Kosina agreed, saying that he believes that the bar for stable tree acceptance is too low. Jan Kara agreed, but noted that embedded devices instead prefer a greater number of patches going to -stable, given that it is harder to update an embedded device's kernel than that of an enterprise server. Greg argued that at least part of the reason for the increase in the number of patches going into -stable is:

  1. The increase in the number of patches going into mainline.
  2. Increasing numbers of maintainers properly tagging patches of -stable.
  3. Greg has more time for -stable now that he no longer works for a distro.
  4. The “Fixes:” helps find patches that need to be tagged for -stable.
  5. More people using and caring about -stable, thus submitting more patches to -stable.
  6. Sasha is tagging patches on behalf of maintainers.
  7. Fuzzers are finding more bugs, thus resulting in more fixes.

Greg urged people to not fear the -stable patch rate. Geert suggested that added testing would enable more distros to simply follow mainline, so that -stable is only needed for the current mainline release's -rc releases. Greg responded by challenging Geert to put him out of a job.

Mark Brown noted that he only reviews -stable patches for his subsystem that he did not mark for -stable. Steven Rostedt does the same, and also avoids tagging obscure or minor fixes for -stable. James Bottomley agreed, arguing that backporting is for expediency, not perfection, and further arguing that maintainers cannot be expected to detect things like missing prerequisites to -stable patches, especially for the -stable branches for older kernels. Mark demurred, noting that some -stable-backport regressions were serious enough to be worth worrying about, pointing out that regressions cause people to be cautious about taking -stable updates. James agreed, but noted that this is why people should avoid sending minor fixes to -stable, further arguing that if the patch doesn't fix a user-visible bug, it should not go into -stable.

Jiri argued that two of the 22 commits in stable 4.18.5 should never have been added. Greg replied that he has to trust the maintainers. He also noted that there is a script that shows which shows the stable patches that affect a given configuration, noting that this would have identified the fact that those two patches affect only parisc users. Jiri wondered who Greg trusts in the trigger-happy automatic-selection case, and both Greg and Sasha replied that Sasha filters the results of the automatic selection, and thus Sasha is who Greg trusts in that case. Thus Greg trusts the maintainer and/or Sasha. Greg also asked for help in pushing back on inappropriate patches. Jiri noted that trusting a maintainer for inclusion into mainline and for inclusion into -stable were two different things, which might have prompted James to point out that the aforementioned pair of parisc patches were in fact pushed to -stable to fix a persistent segmentation fault problem that was blocking forward progress on the debian parisc port. Dmitry Torokhov suggested adding -stable justification to the -stable tags, with Steven Rostedt noting that the parisc patches contained no such justification.

Takashi noted that automatically selected patches are in a gray zone in the patch-value/regression-probability tradeoff spectrum, and would like the ability to opt out of the automatic selection. Daniel liked this opt-out idea, noting that automatic selection sometimes backports cleanup patches that somehow resemble stealthy security fixes, sometimes resulting in needless regressions, and for no apparent reason. Greg believes that the reason for the backport can be deduced from the commit log:

Thus an email to everyone mentioned on the patch should include whoever chose it for -stable. Daniel believes that Cc-stable-requested-by would be preferable. Daniel also gave an example cleanup patch that resulted in a regression, and that he did not receive notification for. Sasha countered with a patch for an ugly looking splat that was not sent to -stable. Daniel pointed out that this bug happened only at driver-unload time, which only affects developers, not users. Sasha took this as his cue to unload this driver on an Ubuntu 4.18 kernel, showing the resulting warning and hang. Daniel replied “don't do that, we know about it”, adding that a proper fix would require a few years of effort. Sasha agreed that Daniel could opt his tree out of the automatic-selection process, but noted that this would mean Daniel manually handling -stable backports. Sasha also said that Daniel's tree appears to be in very good shape, and requires fewer -stable submissions than most.

Sasha gave an example sound-subsystem commit, asking if it should go to -stable. Takashi would prefer to subject this one to more testing before sending it to -stable. Sasha asked if a one-month delay would suffice. Takashi suggested a few weeks after the final release. Sasha noted that waiting until a few weeks after the final release would mean that the relevant stable tree was already end-of-life (EOL). Takashi suggested ignoring such non-urgent fixes for normal stable, and sending them only to LTS once it has been confirmed to be sufficiently stable. Thomas Gleixner agreed with Takashi that sending non-urgent fixes to near-EOL stable trees is pointless. Sasha noted that kernels are released every couple of months, and that normal stable goes EOL a week or two after the next kernel is released, which means Takashi's suggested wait time means ignoring most fixes.

Sasha Levin suggested a stable-next that followed mainline more closely than does -stable, thus pretesting the backports to real -stable. In response to Greg's “patch Tuesday” comment, Sasha suggested changing the current version's name from “Merciless Moray” to “Microsoft Linux”.

Further discussion included the merits of old KABI-stable kernels, -stable release cycle length, improving stability trends and costs of further improvements, and relation to embargo process and CVE management.