Jiri Kosina: live kernel patching

May 2, 2014

Participants: Andy Lutomirski, Chris Mason, Dave Jones, James Bottomley, Kees Cook, Li Zefan, Masami Hiramatsu, Mimi Zohar, and Pavel Emelyanov.

People tagged: Developers of kpatch, kgraft, CRIU-based solutions, and ksplice.

Jiri Kosina notes that there are several conflicting live-kernel patching solutions, and suggests that a face-to-face session would help speed up convergence of these solutions. James Bottomley would be happy to involve the CRIU people, but noted that the CRIU approach isn't based on patches, but a whole new kernel. One advantage of this is that it avoids on-the-fly kernel patching, and the corresponding disadvantage is that there is a short hiatus while the CRIU checkpoint, kexec, and CRIU resume complete. Li Zefan noted that while his employer (Huawei) would really like a live-patching solution, the live-patching solution that they would be willing to use is one that is in mainline. Masami Hiramatsu indicated an interest, especially in kpatch, for use in mission-critical non-stop systems.

Chris Mason indicated strong interest, mainly for in-memory databases. Andy Lutomirski wondered if in-memory databases would benefit from the ability to kexec without losing the in-memory data, and Chris replied that they would benefit, suggesting using a reserved chunk of memory that survives the kexec, which Pavel Emelyanov would like to see as well. Pavel asked if the preserved memory would be anonymous memory or part of the page cache. Chris said that memcached uses shm and other applications use the page cache.

Jiri Kosina argued that the CRIU-based approach could in fact change kernels out without losing volatile application data, but also stated that this would increase the down time due to the need to checkpoint all the volatile data. James Bottomley countered that the new pramfs capsule (not yet upstream) allows zero-copy checkpoint/restore by passing the pages into a capsule that survives the kexec, which can then be accessed during the subsequent restore. James also noted that the pramfs capsule was likely to be superseded by something similar to splice. Pavel Emelyanov added that even without pramfs capsule, non-dirty mapped memory does not add overhead to the checkpoint operation.

Kees Cook is interested in the security implications, both in terms of patching security bugs and in terms of the ability to make arbitrary changes to the kernel, thus adding arbitrary new security holes. Jiri Kosina noted that root is required to add new security holes using via kernel patching. Dave Jones noted that even root is insufficient in secure-boot mode, at least assuming signing is enforced. Jiri Kosina pointed out that secure-boot mode does not prevent module loading, which also can be used to patch the kernel. Kees Cook agreed, and would like to see something that prevented unauthorized module loading. James Bottomley argued that this topic belongs in the secure-boot discussion, and further argued that a gpg-like “web of trust” model involving the various distros is needed for module-load-time signature verification. Kees Cook called for careful avoidance of UEFI-specific thinking when developing a signature-verification web of trust. In particular, Kees would like the option of avoiding signatures when the module comes from a trusted source (as in the sysadm loading it is the same guy who built it). James Bottomley said that the security method is orthogonal to the live-patching mechanism.

Mimi Zohar argued that any code added to the system needs to be measured an appraised, including not only the kernel, but userspace packages as well. The sysadm could then choose which public keys to trust. Mimi also noted that the sysadm could disable the measure-appraise process if desired.