Rodrigo Vivi: Challenges in Upstream vs. Embargoed Development in Intel Graphics

September 9, 2018

Related Material:

Handling of embargoed security issues (albeit a somewhat different type of embargo).
The hidden costs of embargoes (Red Hat Security Blog).
Addressing Meltdown and Spectre in the kernel.
CVE-2018-5390 and “embargoes”.

Additional Participants: Arnd Bergmann, Daniel Vetter, Greg KH, Jani Nikula, Jon Masters, Leon Romanovsky, Linus Walleij, Mark Brown, and Sean Paul.

Rodrigo Vivi suggested a discussion on embargo vs. upstream development, hoping to promote an upstrream-always mentality. Leon Romanovsky asked for clarification, wondering why knowledge of internal Intel development was useful outside of Intel. Rodrigo stated that he was concerned not about government restrictions or security embargoes, but rather embargoes due to internal Intel restrictions. He also suspects that Intel is not the only organization with this sort of challenge. Leon was happy that Rodrigo favored upstream-first development, but based on his own experience with Mellanox believes that the required changes are inside Intel rather than outside (Greg KH agrees). Leon also suspects that Rodrigo is struggling with the business justification for reducing the embargo.

Daniel Vetter believes that this problem will persist until open-source hardware becomes ubiquitous, and states that this is not just an Intel problem. Greg KH wondered why this problem was specific to graphics, but agreed that a discussion on how to handle pre-release hardware and upstream drivers would be a nice proposal. Daniel agreed that it might be good to also look outside of graphics. Mark Brown and Leon agreed that this would be a good topic, and Leon further asked that Rodrigo or Daniel share their pain points.

Rodrigo responded with the following:

Rebasing from mainline on top of LTS is problematic because DRM moves quickly, which will likely eventually require a rewrite of LTS's version of DRM. Leon suggests basing code on the latest -rc from Linus.
Everything is to be upstreamed as soon as the embargo lifts, which requires tracking not only the required patches, but also their history. Leon suggests that several internal developers be responsible for upstreaming, but that original patch authors were encouraged to respond to external mailing-list discussions.
Code-review quality suffers due to the big-bang patch-release to upstream. Leon suggested avoiding staging, citing “constant nightmare with lustre”.
Demanding good internal reviews causes problems due to the patches having “Reviewed-by” when they first appear in public. Leon believes that such “Reviewed-by” clauses are a good thing, and are in fact a way to reward internal developers for their time and effort.

Rodrigo believes that they have a good understanding of potential solutions for 1-3 above, but are especially interested in discussions on 4.

Mark said that in the past although he used an LTS backport as an integration point for internal testing and development, he also simultaneously maintained corresponding patches against -next as the primary development platform, upstreaming anything that can be upstreamed as soon as feasible. When the marketing people gave the go-ahead, Mark would push out all the patches that were held back. Mark feels that this approach worked pretty well.

Linus Walleij believes that pretty much any company working with SoCs for routers, handsets, and so on will have the same problem. When Linus was responsible for such a situation, he took an ad-hoc approach that included the following points:

Classify components so as to let anything non-embargoed to go upstream immediately. Mark noted that this is easier to do for SoCs or standalone chips than for more complex devices such as the i915 graphics driver.
Get management to pre-approve a cut-off date for the embargo, so that when that date arrives, developers can immediately start pushing code upstream. Arnd Bergmann suggests also having a deadline for when the patches must be publicly posted. Linus pointed out that such a deadline would need to be imposed further up the supply chain, and that he had had good results bringing vendors around to this requirement over time, with which Sean Paul and Jon Masters agreed, and which. Rodrigo applauded. Mark worked this from the supply-chain side, providing such a deadline and suggesting that their customers demand similar deadlines from their other suppliers.
Use internal developers with good upstream skills and experience, so that upstream code-quality problems can be anticipated and fixed as soon as possible.
Rebase internal development frequently so as to minimize the Hamming distance to mainline. Linus states that anything else really isn't upstream first, but that this point was always the most controversial. Rodrigo agreed, but stated that he himself had once believed frequent rebasing to be insane. However, Rodrigo said that the “rebase” to -rc1 was usually more like a forward port than a rebase. Rodrigo called out CI as an important component of a successful constant-rebasing strategy.

Jani Nikula agreed with Linus, but noted that reviewing the first version does not always constitute a review of that same code after it has been rebased a dozen times. Jani would like to see discussion on this point.