Jiri Kosina: stable workflow

July 20, 2016

Related Material::

  1. 2015 LKS “The stable kernel process” LWN article (2015 LKS “Issues with stable process”)
  2. 2015 LKS “Stable tree maintenance” (2014 LKS “stable workflow”, 2014 LKS “stable issues”)

Additional Participants: Alexandre Belloni, Andrew Lunn, Bird, Timothy, Chris Mason, Christian Borntraeger, Daniel Vetter, Dan Williams, David Woodhouse, Dmitry Torokhov, Geert Uytterhoeven, Greg KH, Guenter Roeck, James Bottomley, Jani Nikula, Jan Kara, Jason Cooper, Jiri Kosina, Johannes Berg, Jonathan Cameron, Jonathan Corbet, Josh Boyer, Justin Forbes, Laurent Pinchart, Luck, Tony, Luis de Bethencourt, Luis R. Rodriguez, Mark Brown, Olof Johansson, Rafael J. Wysocki, Shuah Khan, Stephen Hemminger, Steven Rostedt, Sudip Mukherjee, Takashi Iwai, Theodore Ts'o, Trond Myklebust, Vinod Koul, Vlastimil Babka, and Zefan Li.

People tagged: Fengguang Wu and Greg Kroah-Hartman.

Jiri Kosina raised this perennial topic, suggesting a move from “random people pointing to patches that should go to stable” to “maintainers sending pull requests”, noting increasing numbers of regressions in the stable trees. Guenter Roeck seconded this sentiment, noting that his new employer is sufficiently unhappy with stable releases to stop using them. Guenter would therefore like to see increased quality of the -stable trees. Takashi Iwai agreed that this is a worthy topic, calling out the large number of stable branches, amount of testing, and details for workflow as good subtopics. Jani Nikula also indicated interest.

Speaking of subtopics:

  1. Who Sends What Where?
  2. How Many for How Long?
  3. How Stable Should -stable Be?
  4. Testing and Review, Or Lack Thereof
  5. Test Suites and Frameworks
  6. Flavors of Stability
  7. Hardware-Free Hacking
  8. Who Uses Stable Kernel Trees?
  9. LTSI?
  10. Quilt? git? Both? How?

Who Sends What Where?

Tony Luck believes that the maintainers should send a list of commit IDs to be cherry-picked instead of a pull request, noting that it is rare to need a fixup patch or a full-up backport. Jiri agreed that a cherry-pick list might well be better than a pull request, and is mainly interested in seeing a well-defined set of people who are responsible for sending -stable patches. Jiri believes that if this change puts too much load on maintainers, then that is a sign that more maintainers are needed—or that fewer patches should be sent to -stable. Guenter agreed that the current process is letting too many patches into -stable. Johannes Berg countered with a hybrid model, where anyone can CC stable, but that some sort of explicit ack from the maintainer also be required. Vlastimil Babka argues that the responsible person need not be a maintainer, just someone designated for the job. However, Vlastimil also suggests that the responsible person be required to actually check the patch against each applicable stable kernel version.

Jiri pointed out that CCing the maintainer with a CC stable didn't necessarily result in that maintainer putting much thought into whether the patch was in fact suitable for inclusion in -stable. Mark Brown agreed, noting that the -stable review patchbombs were often too noisy to be useful. James pointed out that David Miller does curate -stable patches for networking, and suggested involving similar maintainers in the discussion. Jiri Kosina added that if maintainers are overwhelmed, offloading to Greg makes no sense, but fixing the maintainer workflow for that subsystem could make quite a bit of sense.

Ted wondered if a patch queue checked into git might not work better, given that people seem to be just cherry-picking specific patches. Justin Forbes pointed to stable-queue.git as an example. Mark Brown believes that the upstream commit IDs are helpful, and that a queue of patches can be easily generated from the current -stable trees for those who want that. Olof, Guenter, Geert Uytterhoeven Dmitry Torokhov and James Bottomley agree, and Mark added that full git trees are required by many testing frameworks. Ted noted that quilt allows patches to be dropped completely, while git normally keeps commits, which are replaced and/or modified by later commits. The git approach thus can make it hard to figure out which commits you should take. Dmitry replied that only bleeding edge community distros use pure stable. Others have work on top of stable, so “quilt rebases” don't work. Additional discussion ensued.

How Many for How Long?

Andrew Lunn asked if there was numerical evidence supporting the notion that maintainer control of -stable submissions actually improved matters. [ Ed. note: And in an election year! Sacrilege!!! ] Rafael J. Wysocki independently raised this same point, and added that he suspects that it is difficult to judge the “regression potential” of a given patch up front. Rafael also suspects that the policy of avoiding reverting from stable unless/until mainline reverts is problematic, given that mainline might choose to apply a fix patch instead of reverting. Dmitry Torokhov agreed with Rafael, suggesting that broken -stable patches be reverted, then reapplied with the fix when/if mainline gets around to fixing them.

Dmitry Torokhov called out the fact that “Everyone and their dog has a stable release nowadays”, resulting in yet another maintainership scalability problem. Ted Ts'o noted that this was in part due to the ease with which a new stable tree can be set up, and that this is OK as long as people understand what the intent is.

How Stable Should -stable Be?

Rafael reviewed the original motivation of -stable:

“So going back to the origins of -stable, the problem it was invented to address at that time, IIRC, was that people started to perceive switching over to the kernels released by Linus as risky, because it was hard to get fixes for bugs found in them. The idea at that time was to collect the fixes (and fixes only) in a "stable" tree, so that whoever decided to use the latest kernel released by Linus could get them readily, but without burdening maintainers with having their own "stable" branches and similar. And that was going to last until the next kernel release from Linus, at which point a new "stable" tree was to be started.”

Rafael argued that the -stable trees do in fact fulfil this goal, especially the more recent ones, and suggested that the issues were not with -stable as conceived, but rather with attempts to treat -stable as long-term-stable. Jiri Kosina agreed that many of his concerns with -stable were in fact more applicable to long-term stable trees.

Trond Myklebust noted that we don't have a good set of regression tests, and wondered if it was time for another rousing round of “how do we regression-test the kernel” discussion. Dan Williams voted in favor. James pointed out that many of the regresssions were in device drivers, which are not well served by generic regression tests. James suggests that both Reviewed-by and Tested-by tags be required for all device-driver changes, thus at least ensuring that the patch has run on the corresponding device. Trond pointed out that this only applied to mainline, because testing and review on mainline does not necessarily imply regression freedom for any given stable branch. Although James agreed that mainline testing is no guarantee of -stable regression freedom, he also pointed out that filtering out the patches that are broken in both environments is a good thing. And that people lacking the hardware are going to have some difficulty testing the corresponding driver. Trond argued that unit testing could nevertheless help ensure that local constraints are being respected.

Rafael wondered if James had hard statistics backing up his statements, but agreed that hardware is needed to really test driver patches. James replied that his data was strictly anecdotal, but said that if he is suspicious of a patch, he marks it as such, for example:

	cc: stable at vger.kernel.org # delay until 4.8-rc1

James also believes that we should discuss stable practices separately from testing. Trond noted that there is overlap between the two topics, but agreed that they could be discussed separately. In turn, James proposed that the stable-workflow topic begin with one of the maintainers who does their own tree, continue on with stable regressions, and end with a debate on the appropriate numbers and types of stable trees. Trond agreed with James's list of topics.

Jani Nikula believes that a cc: stable tag should only be used in cases where the patch is clearly a bug that is known to be present in the relevant stable kernels, however, Jani does not see such tags as a guarantee that the patch is appropriate for those kernels. Mark Brown argues that more stable-process Q/A is needed, and wishes to avoid discourage stable tagging. Rafael J. Wysocki agreed with Mark, arguing that “stable” does not necessarily mean “no regressions”, but rather “here is the stuff to take into consideration”. Jani Nikula felt that Rafael's interpretation matches current reality, but suspects that it does not match Documentation/stable_kernel_rules.txt. Greg KH questioned Jani's perceived disconnect between reality and Documentation/stable_kernel_rules.txt, and asked for hard data on regressions in stable trees. Jani Nikula suggested that the rules could be clarified, but Greg KH believes that people really do understand and asked for specific examples of real problems. Examples with varying levels of detail and realism were put forward here, here, and here, with ensuing discussion.

Jiri Kosina gave this example, which involved a “fix” to a bug that didn't actually exist in the -stable release in question. James Bottomley argued that the problem in this case really was with upstream review rather than -stable review. Jiri Kosina countered that the fix was perfectly valid upstream, but broken in the older stable releases. Jiri also said that he wished to call attention to the lack of stable review, agreeing with James's characterization of “yes, I already reviewed this in upstream”. James Bottomley argued that expecting all submitters to be familiar with all stable versions was unrealistic, suggesting that Jiri apply only those stable patches with a cc stable and a fixes tag, a suggestion that Steven Rostedt agreed with. Jiri Kosina liked that suggestion, also likes the idea of an explicit version range. However, Greg KH resisted the notion that fixes tags be required, noting that there are whole subsystems that never mark anything for stable, even given the current easy rules. Much discussion on various corner cases ensued, including device-ID additions, exactly what constitutes a fix, best practices vs. hard requirements, paths to enlightenment, and motivational tools, including the obligatory Dilbert comic.

Testing and Review, Or Lack Thereof

Jason Cooper suspects that most of the regressions are build or boot failures, which he believes should be amenable to automated testing. Mark Brown agrees. Guenter Roeck also agrees with the value of automated testing, but isn't convinced that there are enough test machines, with the possible exception of the x86 0day build robot. Guenter also suspects that it is not that the stable trees have gotten worse, but rather that improved test automation gives us a better idea of just how bad things have always been. Guenter provided an example of a simple fix that resulted in no fewer than five follow-up commits. James Bottomley agreed with Guenter, noting that patches that are perceived to be trivial might receive less review than they deserve, thus letting regressions slip through. David Woodhouse called out commit fa731ac7ea0 as a fix that introduces a bug, and is suspicious of patches that claim to fix compiler warnings.

Guenter pointed out that there actually is quite a bit of Q/A in the community, but that this Q/A is not sufficiently robust and that there are way too many -stable trees to spread this Q/A over. Jiri Kosina agreed, noting that in contrast, Linus's tree benefits from seriously crowdsourced testing. Guenter was unwilling to give up so easily, arguing for consistent and thorough testing of all stable kernels. Ted Ts'o does limited testing on 3.10, 3.14, 4.1, and 4.4, with 3.14.73 being a bit of a problem child. Ted also stated that any effort to shrink the number of -stable trees should look at who is using them, given that he has sometimes been discouraged by the fact that getting a fix into -stable doesn't help the end user, particularly in cases where vendors ignore -stable. Mark Brown stated that one reason that SOC vendors' BSPs diverge from mainline is that the BSPs contain under-development code not yet ready for inclusion. Mark also countered the “too many stable kernels” argument with “if people want to run a given kernel version it's nice for them to have a place to collaborate and share fixes.”

Takashi Iwai noted that there have been cases where the fix was in fact correct for mainline, and introduced regressions only in older kernels, which Takashi believes indicates a need for better validation of stable trees. Jiri Kosina agreed, and wondered if Fengguang's 0day test robot might be able to help, but expressed a concern that stable-tree testing might be quite bursty, resulting in 0day overload. Guenter replied that Greg KH keeps a nice even workflow, but isn't sure about other -stable tree maintainers. Zefan Li suggested setting up a stable branch within a tree that Fengguang's 0day test robot already tests. Guenter indicated that if a given -stable tree was being tested within some specific git tree, then he would like to pick up -stable updates from that git tree.

Rafael noted that it is not just a matter of getting the review and testing done -- it must also get done within rather tight timeframes. [ Ed note - Yay! Real-time review and testing! ] Jason Cooper suggested that examples of regressions induced by stable patches be used to drive the discussion. Jon Corbet pointed out that there was in fact such data here.

Sudip Mukherjee suggested a stable-tree-next approach to increase testing. Jiri Kosina wondered how applicable -next's merge-a-gazillion-trees approach would be to -stable trees, which pick and choose patches.

Vinod Koul argued that, just as patches sent upstream are tested by their submitters, so too should -stable patches be tested by their submitters, not just once, but against each stable tree that the patch applies to. Ted Ts'o said that such a policy would result in submitters never CCing to stable. Guenter agreed with Ted, arguing that testing has to happen in the stable tree. Vinod argued that the submitter was more likely to have the relevant hardware, and thus was in the best position to do testing, at least for device drivers. Ted pointed out that even submitting fixes upstream was difficult in many work environments, and that requiring additional testing was therefore not likely to have an overall positive effect. Luis de Bethencourt agreed with Ted that raising the barrier to entry for patch submission would be counterproductive, and wondered if increased sharing of infrastructure and information among stable-branch maintainers would help. Vinod agreed that these were good counterarguments, but asked what his opponents thought that a good solution might look like. Some suggested solutions regarding hardware availability may be found here.

Test Suites and Frameworks

David Woodhouse agreed that testing is valuable, and suggested that there be an expectation that new code be submitted with test cases, noting that even device drivers can sometimes be tested using tools based on MMIO tracing and playback. Guenter Roeck is concerned that requiring test cases could reduce the number of contributions, noting that upstreaming is already unpopular in many circles, even without the test-case requirement. David Woodhouse argued that for an expectation as opposed to a hard requirement, and further argued that having test infrastructure in place would make test cases easier to create. David also suggested that test-case creation might be a good proving ground for newbies. Guenter Roeck agreed with this approach, particularly with test cases as newbie proving grounds.

Laurent Pinchart liked the idea of test infrastructure, calling one out and also calling out the benefits of test cases as customer requirements, along with the joy of catching bugs in your own test framework as opposed to them escaping to the field. Steven Rostedt called out tools/testing/selftests/ as a place for tests and test infrastructure, but Laurent Pinchart pointed out that kselftest lacks driver-related test cases (a concern amplified by Mark Brown) and also lacks standard logging and status reporting (Laurent Pinchart suggests Test Anything Protocol). Steven Rostedt suggested that kselftest extensions be a core topic at kernel summit, a sentiment with which Shuah Khan agreed.

Not to be outdone, Tim Bird called out the Fuego framework, showing some of its workings. For his part, Greg KH expressed appreciation for a large number of testing services that he relies of in his stable-tree work. Further discussion added yet more paint to the cc stable bikeshed, detailed some of the test services that Greg called out, and speculated on how many clones of Dave Miller there were given all the -stable work he gets done.

Steven Rostedt suggested that tests requiring specific hardware should provide some sort of “unsupported” indication when that hardware is not available. Although Mark Brown agreed that this approach can be useful, he is concerned that device-driver bugs would go unnoticed. Steven Rostedt suspects that if no one has a given piece of hardware, then lack of testing isn't so much of a problem. Mark Brown would like a clear distinction between tests that anyone can run from those requiring specific hardware in order to improve test reproducibility. Steven Rostedt suggested a separate directory in kselftests for hardware-dependent tests. Luis R. Rodriguez suggested use of soft Kconfig entries to check for the presence of the required device drivers.

Alexandre Belloni noted a third class of tests, those requiring hardware, but which can be run against a wide variety of devices, calling out real-time-clock tests as an example, but noting that such tests usually change the system time. Steven Rostedt would prefer to restrict kselftests to non-destructive tests, but speculated on the possibility of saving and restoring the system time.

Laurent Pinchart suggested that use of standard tests frameworks could enable out-of-tree tests.

Flavors of Stability

Dmitry Torokhov suggested that there should be multiple flavors of -stable trees, one for security, another for core fixes, and a third for hardware support and device drivers. Ted Ts'o indicated some support for separating out hardware. Rafael is concerned that multiple flavors would multiply confusion, and Takashi Iwai agreed that flavors might not be all that helpful. Rafael also believes that the probability of regression correlates well with the complexity of the patch. Dmitry agreed that general-purpose distros would want to take all fixes, but that embedded devices might well want to be more choosy, especially those with working with rare devices that receive very little testing. Dmitry was also a bit skeptical of Rafael's complexity-quality correlation.

Hardware-Free Hacking

Dan Williams pointed out that it is possible to do significant testing without hardware, calling out the ACPI NVDIMM Firmware Interface Table (NFIT) tables, which are tested by tools/testing/nvdimm/. Dan agrees that not all bugs can be found this way, but believes that it is nevertheless a useful approach. Guenter notes that hardware can be supplied (for example, via hardware testbeds), or emulated using things like qemu. Christian Borntraeger seconded qemu, and further suggested that some sort of “make test” (presumably using qemu) be required to work everywhere. Christian would also like “make test” failures to trigger -stable reverts even in the absence of a corresponding revert of the upstream patch.

Who Uses Stable Kernel Trees?

Dmitry Torokhov says that community-based distros use the more recent stable trees, not the older trees.

Jason Cooper wondered what sorts of regressions cause people to give up on -stable. Olof Johansson recounted experiences with a group that maintained their own driver instead of relying on either the mainline or stable due to bugs being introduced into these external trees. Olof believes that the specific problems have since been solved, but points out that once a group of developers have been burnt by -stable, they will be extremely reluctant to even consider using -stable ever again. That said, that group did continue using -stable as a source of fixes, so that their first reaction to finding a bug was to check -stable for a fix. Olof concludes by noting that -stable was of substantial value to this group despite the fact that they did not use it in the conventional sense. He followed up stating that his experiences were with drivers frequently used on x86 laptops.

Ted Ts'o thanked Olof for the “color commentary”, and wondered if other system-on-a-chip (SoC) vendors were using -stable, but noted that such vendors normally lose interest in any given device once they stop shipping it. Ted also pointed out that trawling -stable in response to bugs won't locate important security fixes. Olof does not believe that optimizing -stable workflow for SoC vendors will be useful. Olof agreed that groups working on embedded systems often do miss out on CVEs, but said that larger groups can track CVEs or have representatives on the security lists.

Stephen Hemminger added that Brocade regularly merges stable kernels into their code base without serious issues, but notes that in Brocade's case there are few vendor-specific changes. Stephen suspects that we are only hearing from the unhappy users, but nevertheless believes that it would be good to reduce the number of unhappy users.

LTSI?

As noted earlier, Greg KH put forward the LTSI Test Project. Alex Shi liked the backporting effort, and would like to see more backporting, but feels some upstream focus is required. Olof Johansson suggested that moving to newer kernel versions would provide more eyes and less need for backporting. Olof also noted that LTSI is a different beast than is -stable because LTSI includes feature backports in addition to backported bug fixes. Olof also suspects that the goals of limiting the number of features backported and increasing the size of a given tree's community are in conflict. Alex Shi believes that the number of feature backports can be limited by carefully chosen backporting criteria, and points out that LTSI has relatively few feature backports.

Greg KH wanted to know more about Alex Shi's tradeoffs between LTSI and upstream. Alex replied that industry needs more features backported to LTS, for example, ARM PCIe, opp v2, writebacks, and cgroups, as was done to Linaro's stable kernel (LSK) 4.1. In fact, Alex believes that new features should sometimes be developed on LTS because mainstream maintainers often cannot do adequate testing. Ard Biesheuvel argues that LSK is needed not in general, but rather because arm64 support is still immature, so that LSK is not all that relevant to systems used in production. Alex agreed that LSK is not a good model for stable kernel trees intended for production use, but believes that LSK is nevertheless a good proof point for the need to backport more features, including features not directly related to arm64.

Mark Brown noted that there has been significant pushback against LTSI, for example some embedded vendors were concerned about conflicts between their internal work and work done on LTSI. Greg KH was puzzled by this, given that embedded vendors were the ones pushing for LTSI in the first place. Greg also wondered about the exact nature of the conflicts. Mark Brown said that some Linaro members wanted LSK instead of LTSI, and that the inclusion of board support and vendor-specific drivers was a problem for some of these members, who then had to merge changes in LTSI with changes in their internal trees. Greg KH understands that the specific LTSI tree might be a problem for some people, but would still like more collaboration among people working on long-term support trees like LTSI and LSK. Guenter Roeck would like to know what motivates companies to use and to not use LTSI (and Mark Brown agrees, though Greg KH suspects that it would not be all that relevant to most attendees). Guenter added that his problem with LTSI is that it is a collection of patches rather than a git tree, which leads us into the next section. Steven Rostedt believes that the passage of time is key, so that older stable trees have more pressure for complex fixes and features. Steven also notes that bugs mutate over time, for example, as timing changes.

Quilt? git? Both? How?

Greg KH feels that sets of patches are most appropriate for maintaining stable kernels, and that they are especially helpful in letting people know just far they are deviating from mainline, something that is hidden when using git trees (Greg also wants to see patches from Mark Brown, who agreed to supply two). Greg also pointed out that there are scripts to pull the quilt series into git, for those who like git. Guenter Roeck countered that git was quite useful to him in a former life where they maintained a few hundred patches on top of mainline, tracking a stable kernel. Guenter has seen serious problems in projects that attempted to do this using sets of patches, and would not like to use quilt for active development. Greg noted that enterprise distros use quilt, which indicates that developing on top of quilt cannot be all that hard. NeilBrown (of SUSE) agrees that enterprise distros use quilt, but also points out that they use git for development using an upstream-first approach (as does Red Hat, although milage may differ for embedded distros). Neil does not like the idea of using quilt for development, and noted that given that Greg wants quilt for maintaining stable trees and that Guenter wants git for development, perhaps they are actually in violent agreement. Greg agreed with Neil's assessment, and said that Geert was working on producing a git tree for LTSI so that people wanting git and LTSI could have the same commit ID for a given patch. Jiri Kosina said that SUSE already automatically generates git trees from quilt patch series, and gave the relevant URLs. For extra credit, SUSE maintains its quilt series in git. Greg asked how this handled updating a patch in the middle of a quilt series, and Jiri gave an example.

Geert Uytterhoeven noted that git branch and git rebase can be used to update patches in the middle of a series while leaving the old series intact. Then git format patch can be used to regenerate the quilt series. James Bottomley added that git cherry can be used to identify patches that are present in one series but not another, and that git cherry pick can be used to pull those patches into the series that lacks them. James admitted that stgit might give better user experience. NeilBrown noted that if git cherry pick added the upstream commit ID to the commit log, it would be possible to very nearly emulate quilt commands in git—however, this could be confused by conflict-resolution changes. James Bottomley suggested that the same techniques used by git to detect file moves might be applied to overcome problems introduced by conflict-resolution changes. James also noted that the -x argument to git cherry pick records the upstream commit ID, as did Dmitry Torokhov. NeilBrown read the manpage and learned that -x records the upstream commit ID only in the absence of conflicts, which does not work for his use cases. Dmitry Torokhov agreed that the manpage in fact said that, but that -x really does work when there are conflicts, and also helpfully documents what the conflicts were in the commit log.

Geert Uytterhoeven pointed out that you can get the effect of git cherry pick by giving the --onto argument to git rebase. James Bottomley suggested -i instead of (or perhaps in addition to) --onto, but likes the fact that git cherry pick is scriptable. James points out that git cherry is needed either way. Geert Uytterhoeven avoids git rebase -i exactly because it is not scriptable, but notes that git rebase automates the filtering that is done manually (or by the script) in the case of git cherry and git cherry pick. Vlastimil Babka suggests that it might be possible to use git rebase's --edit-todo and --continue arguments to more closely emulate quilt commands. Laurent Pinchart called out git rebase --continue as doing the right thing, either continuing if conflicts were handled correctly or complaining if not.

Daniel Vetter calledout the drm/i915 maintainership tools, which are discussed at length here.