My Two Cents: Built my first kernel module

At my last company, Field Support Engineers debugged broken instruments at 3am in hospitals using one tool: system logs. No shell access, no debugger - just dmesg output from whatever the kernel captured.

This is akin to searching for a needle in the wrong haystack. Kernel space is not user space with fewer features — it’s a different execution model entirely. There is no libc, no stdout, no file descriptors, and no guarantee you’re running in process context. printf() relies on user-space abstractions that don’t exist past the syscall boundary. In the kernel, there is nothing to “print to,” and no application to recover if something goes wrong. A wrong assumption here crashes the entire system. That’s why printf() cannot exist in kernel context, and why printk() exists at all: it provides minimal visibility without pretending the kernel is a safe place to make mistakes.GPU drivers live here. NIC drivers live here. Bugs here crash entire systems

+-----------------------------+
|        User Space           |
|                             |
|  Applications / Scripts     |
|  (printf, files, sockets)   |
|                             |
+--------------+--------------+
               |
               |  syscalls (read, write, ioctl)
               v
+--------------+--------------+
|        Kernel Space         |
|                             |
|  VFS / Scheduler / MM       |
|                             |
|  Your Kernel Module         |
|  (tempmon.ko)               |
|      printk()               |
|      hardware access        |
|                             |
|  GPU / NIC Drivers          |
|  (nvidia.ko, mlx5, etc.)    |
|                             |
+--------------+--------------+
               |
               v
+--------------+--------------+
|        Hardware             |
|                             |
|  CPU / GPU / Sensors        |
|  PCIe / Memory / IRQs       |
|                             |
+-----------------------------+

printk() writes into a fixed-size kernel ring buffer, not to a terminal, and messages are tagged with log levels like KERN_INFO or KERN_ERR that control whether they are stored, forwarded to the console, or dropped based on the current log level. The buffer can wrap under load, so older messages are overwritten by newer ones. The kernel also rate-limits repeated messages to avoid flooding the system, which means identical logs may be suppressed. As a result, logs can appear out of order, show up late, or seem to disappear entirely—especially during crashes, high interrupt activity, or early boot—because logging is best-effort, not guaranteed delivery.

Kernel code can run in process context or interrupt context, and printk() behaves differently in each. In interrupt context there is no sleeping, limited buffering, and higher priority execution, while process context can be preempted or delayed. Because log messages are buffered and flushed asynchronously, preemption and concurrent CPUs can cause messages to interleave or appear reordered. During thermal events, GPU hangs, or PCIe link flaps, the system may be saturated with interrupts or stuck in a fault path, delaying or dropping log output entirely. This is why GPU driver logs are sometimes misleading during hard lockups: the failure happened, but the system never reached a safe point to emit or flush the messages that would have explained it.

That's why I'm building kernel modules now. I'm targeting GPU-adjacent roles and need to understand thermal monitoring where it actually happens: kernel space. Today's module does nothing useful yet - just loads, prints to kernel log, unloads. But getting it working taught me something I somehow missed in 15 years of embedded work: why printk() exists at all.

And before we get started let us ensure that we have our "target debug system" ready. I am on a Ubuntu 22.04 running on an Intel Core i7-9700F - 16GB Memory - NVIDIA GeForce RTX 2060 SUPER. So the instructions for Nvidia Edge compute(Orins and the Jetsons) might vary ever so slightly. Please check with official Nvidia documentation.

# Install driver
sudo ubuntu-drivers autoinstall
sudo reboot

# Verify GPU
nvidia-smi

# Install CUDA + kernel tools
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-12-6 build-essential linux-headers-$(uname -r)

# Set paths
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

# Test
nvcc --version

And here is that code. In all its Guts and Glory. Well this is how it started atleast.

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ananth");
MODULE_DESCRIPTION("GPU Thermal Monitor");

static int __init temp_init(void){
    printk(KERN_INFO "Tempmonitor: Module loaded\n");
    return 0;
}

static void __exit temp_exit(void){
    printk(KERN_INFO "Tempmonitor: Module unloaded\n");
}

module_init(temp_init);
module_exit(temp_exit);

In kernel space: no libc, no stdio, no printf(). The printk() call dumps to a ring buffer that feeds dmesg - which is where GPU drivers actually log. Spent 20 minutes confused why nothing printed until I realized I was checking terminal output instead of dmesg | tail. This is where real hardware debugging happens. GPU hangs, thermal throttling, PCIe issues - they all surface here first.

This module is just the starting point. Next, I'll show how to expose kernel data to userspace via character devices, build a ring buffer for time-series data, and process it on GPU using CUDA. Along the way, you'll see why kernel-space logging matters for GPU thermal monitoring and field debugging.

Next: character devices.

My Two Cents

Wednesday, January 7, 2026

Built my first kernel module - printk() isn't printf()

No comments:

Post a Comment

Fixing Race Conditions: SPSC Ring Buffer with Spinlock

Blog Archive