{ "id": "5ae1ad5a-2017-44e5-a9de-2de3ffcb8e50", "created_at": "2026-04-06T00:09:57.886534Z", "updated_at": "2026-04-10T03:20:52.472581Z", "deleted_at": null, "sha1_hash": "72d2e24bfb713a0908321d54879b256d17d6d681", "title": "Kernel Self-Protection — The Linux Kernel documentation", "llm_title": "", "authors": "", "file_creation_date": "0001-01-01T00:00:00Z", "file_modification_date": "0001-01-01T00:00:00Z", "file_size": 68745, "plain_text": "Kernel Self-Protection — The Linux Kernel documentation\r\nArchived: 2026-04-05 20:37:26 UTC\r\nKernel self-protection is the design and implementation of systems and structures within the Linux kernel to\r\nprotect against security flaws in the kernel itself. This covers a wide range of issues, including removing entire\r\nclasses of bugs, blocking security flaw exploitation methods, and actively detecting attack attempts. Not all topics\r\nare explored in this document, but it should serve as a reasonable starting point and answer any frequently asked\r\nquestions. (Patches welcome, of course!)\r\nIn the worst-case scenario, we assume an unprivileged local attacker has arbitrary read and write access to the\r\nkernel’s memory. In many cases, bugs being exploited will not provide this level of access, but with systems in\r\nplace that defend against the worst case we’ll cover the more limited cases as well. A higher bar, and one that\r\nshould still be kept in mind, is protecting the kernel against a _privileged_ local attacker, since the root user has\r\naccess to a vastly increased attack surface. (Especially when they have the ability to load arbitrary kernel\r\nmodules.)\r\nThe goals for successful self-protection systems would be that they are effective, on by default, require no opt-in\r\nby developers, have no performance impact, do not impede kernel debugging, and have tests. It is uncommon that\r\nall these goals can be met, but it is worth explicitly mentioning them, since these aspects need to be explored,\r\ndealt with, and/or accepted.\r\nAttack Surface Reduction¶\r\nThe most fundamental defense against security exploits is to reduce the areas of the kernel that can be used to\r\nredirect execution. This ranges from limiting the exposed APIs available to userspace, making in-kernel APIs hard\r\nto use incorrectly, minimizing the areas of writable kernel memory, etc.\r\nStrict kernel memory permissions¶\r\nWhen all of kernel memory is writable, it becomes trivial for attacks to redirect execution flow. To reduce the\r\navailability of these targets the kernel needs to protect its memory with a tight set of permissions.\r\nExecutable code and read-only data must not be writable¶\r\nAny areas of the kernel with executable memory must not be writable. While this obviously includes the kernel\r\ntext itself, we must consider all additional places too: kernel modules, JIT memory, etc. (There are temporary\r\nexceptions to this rule to support things like instruction alternatives, breakpoints, kprobes, etc. If these must exist\r\nin a kernel, they are implemented in a way where the memory is temporarily made writable during the update, and\r\nthen returned to the original permissions.)\r\nIn support of this are CONFIG_STRICT_KERNEL_RWX and CONFIG_STRICT_MODULE_RWX , which seek to make sure that\r\ncode is not writable, data is not executable, and read-only data is neither writable nor executable.\r\nhttps://www.kernel.org/doc/html/latest/security/self-protection.html\r\nPage 1 of 6\n\nMost architectures have these options on by default and not user selectable. For some architectures like arm that\r\nwish to have these be selectable, the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to\r\nenable a Kconfig prompt. CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT determines the default setting when\r\nARCH_OPTIONAL_KERNEL_RWX is enabled.\r\nFunction pointers and sensitive variables must not be writable¶\r\nVast areas of kernel memory contain function pointers that are looked up by the kernel and used to continue\r\nexecution (e.g. descriptor/vector tables, file/network/etc operation structures, etc). The number of these variables\r\nmust be reduced to an absolute minimum.\r\nMany such variables can be made read-only by setting them “const” so that they live in the .rodata section instead\r\nof the .data section of the kernel, gaining the protection of the kernel’s strict memory permissions as described\r\nabove.\r\nFor variables that are initialized once at __init time, these can be marked with the __ro_after_init attribute.\r\nWhat remains are variables that are updated rarely (e.g. GDT). These will need another infrastructure (similar to\r\nthe temporary exceptions made to kernel code mentioned above) that allow them to spend the rest of their lifetime\r\nread-only. (For example, when being updated, only the CPU thread performing the update would be given\r\nuninterruptible write access to the memory.)\r\nSegregation of kernel memory from userspace memory¶\r\nThe kernel must never execute userspace memory. The kernel must also never access userspace memory without\r\nexplicit expectation to do so. These rules can be enforced either by support of hardware-based restrictions (x86’s\r\nSMEP/SMAP, ARM’s PXN/PAN) or via emulation (ARM’s Memory Domains). By blocking userspace memory\r\nin this way, execution and data parsing cannot be passed to trivially-controlled userspace memory, forcing attacks\r\nto operate entirely in kernel memory.\r\nReduced access to syscalls¶\r\nOne trivial way to eliminate many syscalls for 64-bit systems is building without CONFIG_COMPAT . However, this\r\nis rarely a feasible scenario.\r\nThe “seccomp” system provides an opt-in feature made available to userspace, which provides a way to reduce the\r\nnumber of kernel entry points available to a running process. This limits the breadth of kernel code that can be\r\nreached, possibly reducing the availability of a given bug to an attack.\r\nAn area of improvement would be creating viable ways to keep access to things like compat, user namespaces,\r\nBPF creation, and perf limited only to trusted processes. This would keep the scope of kernel entry points\r\nrestricted to the more regular set of normally available to unprivileged userspace.\r\nRestricting access to kernel modules¶\r\nhttps://www.kernel.org/doc/html/latest/security/self-protection.html\r\nPage 2 of 6\n\nThe kernel should never allow an unprivileged user the ability to load specific kernel modules, since that would\r\nprovide a facility to unexpectedly extend the available attack surface. (The on-demand loading of modules via\r\ntheir predefined subsystems, e.g. MODULE_ALIAS_*, is considered “expected” here, though additional\r\nconsideration should be given even to these.) For example, loading a filesystem module via an unprivileged socket\r\nAPI is nonsense: only the root or physically local user should trigger filesystem module loading. (And even this\r\ncan be up for debate in some scenarios.)\r\nTo protect against even privileged users, systems may need to either disable module loading entirely (e.g.\r\nmonolithic kernel builds or modules_disabled sysctl), or provide signed modules (e.g.\r\nCONFIG_MODULE_SIG_FORCE , or dm-crypt with LoadPin), to keep from having root load arbitrary kernel code via\r\nthe module loader interface.\r\nMemory integrity¶\r\nThere are many memory structures in the kernel that are regularly abused to gain execution control during an\r\nattack, By far the most commonly understood is that of the stack buffer overflow in which the return address\r\nstored on the stack is overwritten. Many other examples of this kind of attack exist, and protections exist to defend\r\nagainst them.\r\nStack buffer overflow¶\r\nThe classic stack buffer overflow involves writing past the expected end of a variable stored on the stack,\r\nultimately writing a controlled value to the stack frame’s stored return address. The most widely used defense is\r\nthe presence of a stack canary between the stack variables and the return address ( CONFIG_STACKPROTECTOR ),\r\nwhich is verified just before the function returns. Other defenses include things like shadow stacks.\r\nStack depth overflow¶\r\nA less well understood attack is using a bug that triggers the kernel to consume stack memory with deep function\r\ncalls or large stack allocations. With this attack it is possible to write beyond the end of the kernel’s preallocated\r\nstack space and into sensitive structures. Two important changes need to be made for better protections: moving\r\nthe sensitive thread_info structure elsewhere, and adding a faulting memory hole at the bottom of the stack to\r\ncatch these overflows.\r\nHeap memory integrity¶\r\nThe structures used to track heap free lists can be sanity-checked during allocation and freeing to make sure they\r\naren’t being used to manipulate other memory areas.\r\nCounter integrity¶\r\nMany places in the kernel use atomic counters to track object references or perform similar lifetime management.\r\nWhen these counters can be made to wrap (over or under) this traditionally exposes a use-after-free flaw. By\r\ntrapping atomic wrapping, this class of bug vanishes.\r\nhttps://www.kernel.org/doc/html/latest/security/self-protection.html\r\nPage 3 of 6\n\nSize calculation overflow detection¶\r\nSimilar to counter overflow, integer overflows (usually size calculations) need to be detected at runtime to kill this\r\nclass of bug, which traditionally leads to being able to write past the end of kernel buffers.\r\nProbabilistic defenses¶\r\nWhile many protections can be considered deterministic (e.g. read-only memory cannot be written to), some\r\nprotections provide only statistical defense, in that an attack must gather enough information about a running\r\nsystem to overcome the defense. While not perfect, these do provide meaningful defenses.\r\nCanaries, blinding, and other secrets¶\r\nIt should be noted that things like the stack canary discussed earlier are technically statistical defenses, since they\r\nrely on a secret value, and such values may become discoverable through an information exposure flaw.\r\nBlinding literal values for things like JITs, where the executable contents may be partially under the control of\r\nuserspace, need a similar secret value.\r\nIt is critical that the secret values used must be separate (e.g. different canary per stack) and high entropy (e.g. is\r\nthe RNG actually working?) in order to maximize their success.\r\nKernel Address Space Layout Randomization (KASLR)¶\r\nSince the location of kernel memory is almost always instrumental in mounting a successful attack, making the\r\nlocation non-deterministic raises the difficulty of an exploit. (Note that this in turn makes the value of information\r\nexposures higher, since they may be used to discover desired memory locations.)\r\nText and module base¶\r\nBy relocating the physical and virtual base address of the kernel at boot-time ( CONFIG_RANDOMIZE_BASE ), attacks\r\nneeding kernel code will be frustrated. Additionally, offsetting the module loading base address means that even\r\nsystems that load the same set of modules in the same order every boot will not share a common base address with\r\nthe rest of the kernel text.\r\nStack base¶\r\nIf the base address of the kernel stack is not the same between processes, or even not the same between syscalls,\r\ntargets on or beyond the stack become more difficult to locate.\r\nDynamic memory base¶\r\nMuch of the kernel’s dynamic memory (e.g. kmalloc, vmalloc, etc) ends up being relatively deterministic in layout\r\ndue to the order of early-boot initializations. If the base address of these areas is not the same between boots,\r\ntargeting them is frustrated, requiring an information exposure specific to the region.\r\nhttps://www.kernel.org/doc/html/latest/security/self-protection.html\r\nPage 4 of 6\n\nStructure layout¶\r\nBy performing a per-build randomization of the layout of sensitive structures, attacks must either be tuned to\r\nknown kernel builds or expose enough kernel memory to determine structure layouts before manipulating them.\r\nPreventing Information Exposures¶\r\nSince the locations of sensitive structures are the primary target for attacks, it is important to defend against\r\nexposure of both kernel memory addresses and kernel memory contents (since they may contain kernel addresses\r\nor other sensitive things like canary values).\r\nKernel addresses¶\r\nPrinting kernel addresses to userspace leaks sensitive information about the kernel memory layout. Care should be\r\nexercised when using any printk specifier that prints the raw address, currently %px, %p[ad], (and %p[sSb] in\r\ncertain circumstances [*]). Any file written to using one of these specifiers should be readable only by privileged\r\nprocesses.\r\nKernels 4.14 and older printed the raw address using %p. As of 4.15-rc1 addresses printed with the specifier %p\r\nare hashed before printing.\r\n[*] If KALLSYMS is enabled and symbol lookup fails, the raw address is printed. If KALLSYMS is not enabled\r\nthe raw address is printed.\r\nUnique identifiers¶\r\nKernel memory addresses must never be used as identifiers exposed to userspace. Instead, use an atomic counter,\r\nan idr, or similar unique identifier.\r\nMemory initialization¶\r\nMemory copied to userspace must always be fully initialized. If not explicitly memset() , this will require\r\nchanges to the compiler to make sure structure holes are cleared.\r\nMemory poisoning¶\r\nWhen releasing memory, it is best to poison the contents, to avoid reuse attacks that rely on the old contents of\r\nmemory. E.g., clear stack on a syscall return ( CONFIG_KSTACK_ERASE ), wipe heap memory on a free. This\r\nfrustrates many uninitialized variable attacks, stack content exposures, heap content exposures, and use-after-free\r\nattacks.\r\nDestination tracking¶\r\nTo help kill classes of bugs that result in kernel addresses being written to userspace, the destination of writes\r\nneeds to be tracked. If the buffer is destined for userspace (e.g. seq_file backed /proc files), it should\r\nautomatically censor sensitive values.\r\nhttps://www.kernel.org/doc/html/latest/security/self-protection.html\r\nPage 5 of 6\n\nSource: https://www.kernel.org/doc/html/latest/security/self-protection.html\r\nhttps://www.kernel.org/doc/html/latest/security/self-protection.html\r\nPage 6 of 6", "extraction_quality": 1, "language": "EN", "sources": [ "MITRE" ], "references": [ "https://www.kernel.org/doc/html/latest/security/self-protection.html" ], "report_names": [ "self-protection.html" ], "threat_actors": [], "ts_created_at": 1775434197, "ts_updated_at": 1775791252, "ts_creation_date": 0, "ts_modification_date": 0, "files": { "pdf": "https://archive.orkl.eu/72d2e24bfb713a0908321d54879b256d17d6d681.pdf", "text": "https://archive.orkl.eu/72d2e24bfb713a0908321d54879b256d17d6d681.txt", "img": "https://archive.orkl.eu/72d2e24bfb713a0908321d54879b256d17d6d681.jpg" } }