Thursday, January 8, 2026

Exposing Kernel Data to Userspace: Character Devices Explained

 


I am a coffee snob and my FSE friend always complained about the bad coffee at the customer sites. Anyhow, like i was saying yesterday kernelspace is not developer friendly as it lacks the basic decencies of a modern developer tooling. So you need some way of exposing this Kernel data to the userspace and thats where the Character Devices come-in very handy. 

When I discovered Character Devices for the very first time, it was like kingdom come. Now I had a somewhat direct way of getting at the kernelspace data without too much drama. And here is what my humble tempmon looked like with the addition of the character devices. This still wont win me any Turing awards, but that's beyond the point. 

In this quick demo, we will fake some temperature data from the GPU(reminder I am on Ubuntu 22.04 RTX2060 Super. YMMV) and we magically bridge it over to userspace.


#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
#include <linux/cdev.h>
#include <linux/device.h>
#include <linux/random.h>

#define DEVICE_NAME "tempmon"


So let us take a moment to familiarize ourselves with the headers. 

  • <linux/module.h> Required for any kernel module
  • <linux/fs.h> provides us with the file ops to read/write/openfiles
  • <linux/uaccess.h> Copies data between kernel and userspace. This is the "💎"
  • <linux/cdev.h> Any character device that is created gets registered
  • <linux/device.h> Creates the /dev/ entry automatically



static dev_t dev_num;
static struct cdev temp_cdev;
static struct class *temp_class;

Now the globals are ;

  • dev_num Is the device ID
  • temp_cdev is the character device that we create
  • temp_class informs the kernel to create /dev/tempmon device

static ssize_t temp_read(struct file *filp, char __user *buf,
                        size_t len, loff_t *off){
                            char temp_data[64];
                            int temp_celsius = 45 + (get_random_u32() % 30);
                            int bytes;

                            if (*off > 0) return 0;
                            bytes = snprintf(temp_data, sizeof(temp_data),
                        "Temperature: %d C\n", temp_celsius);
                        if (copy_to_user(buf, temp_data, bytes)){
                            return -EFAULT;
                        }
                        *off += bytes;
                        return bytes;


                        }

Here is a line-line explanation of what this function does

temp_read() function:This runs when userspace does cat/dev/tempmon 

  • If (*off > 0) return 0 - Already read once? Return EOF so cat stops
  • Generate fake temp 45-75°C
  • snprintf() - Format string in kernel buffer
  • copy_to_user() - Can't directly write to userspace memory from kernel. This safely copies across the boundary
  • Update *off so next read returns EOF
  • Return bytes written
  • static struct file_operations fops = {
        .owner = THIS_MODULE,
        .read = temp_read,
    };
    

    file_operations: 

    Tells kernel "when someone reads this device, call temp_read()". That's the bridge between userspace read() syscall and your kernel function. 

    static int __init temp_init(void){
        alloc_chrdev_region(&dev_num, 0, 1, DEVICE_NAME);
        cdev_init(&temp_cdev, &fops);
        cdev_add(&temp_cdev, dev_num, 1);
    
        temp_class = class_create(DEVICE_NAME);
        device_create(temp_class, NULL, dev_num, NULL, DEVICE_NAME);
    
        printk(KERN_INFO "Temp Monitor: Device created at /dev/%s\n", DEVICE_NAME);
        return 0;
    

    temp_init(): Runs when you insmod. Creates the device in 4 steps: 

    •  alloc_chrdev_region() - Get a device number from kernel 
    • cdev_init() + cdev_add() - Register your read function 
    • class_create() - Tell kernel this is a device class 
    • device_create() - Actually make /dev/tempmon appear


    static void __exit temp_exit(void){
        device_destroy(temp_class, dev_num);
        class_destroy(temp_class);
        cdev_del(&temp_cdev);
        unregister_chrdev_region(dev_num, 1);
        printk(KERN_INFO "TempMonitor: Device removed\n");
    }
    

    temp_exit(): Runs when you rmmod. Undoes everything in reverse order (always cleanup in reverse).



    There we go, fruits of our labor. Kernelspace event being logged over to the userspace.

    In tomorrow's installment lets embellish this code even more to be able to continuously "log" temperature to a ringbuffer that we can cat. 

    The code is hosted here ;

    Character Device Code




    Wednesday, January 7, 2026

    Built my first kernel module - printk() isn't printf()



    At my last company, Field Support Engineers debugged broken instruments at 3am in hospitals using one tool: system logs. No shell access, no debugger - just dmesg output from whatever the kernel captured.

    This is akin to searching for a needle in the wrong haystack. Kernel space is not user space with fewer features — it’s a different execution model entirely. There is no libc, no stdout, no file descriptors, and no guarantee you’re running in process context. printf() relies on user-space abstractions that don’t exist past the syscall boundary. In the kernel, there is nothing to “print to,” and no application to recover if something goes wrong. A wrong assumption here crashes the entire system. That’s why printf() cannot exist in kernel context, and why printk() exists at all: it provides minimal visibility without pretending the kernel is a safe place to make mistakes.GPU drivers live here. NIC drivers live here. Bugs here crash entire systems

    +-----------------------------+
    |        User Space           |
    |                             |
    |  Applications / Scripts     |
    |  (printf, files, sockets)   |
    |                             |
    +--------------+--------------+
                   |
                   |  syscalls (read, write, ioctl)
                   v
    +--------------+--------------+
    |        Kernel Space         |
    |                             |
    |  VFS / Scheduler / MM       |
    |                             |
    |  Your Kernel Module         |
    |  (tempmon.ko)               |
    |      printk()               |
    |      hardware access        |
    |                             |
    |  GPU / NIC Drivers          |
    |  (nvidia.ko, mlx5, etc.)    |
    |                             |
    +--------------+--------------+
                   |
                   v
    +--------------+--------------+
    |        Hardware             |
    |                             |
    |  CPU / GPU / Sensors        |
    |  PCIe / Memory / IRQs       |
    |                             |
    +-----------------------------+
    

    printk() writes into a fixed-size kernel ring buffer, not to a terminal, and messages are tagged with log levels like KERN_INFO or KERN_ERR that control whether they are stored, forwarded to the console, or dropped based on the current log level. The buffer can wrap under load, so older messages are overwritten by newer ones. The kernel also rate-limits repeated messages to avoid flooding the system, which means identical logs may be suppressed. As a result, logs can appear out of order, show up late, or seem to disappear entirely—especially during crashes, high interrupt activity, or early boot—because logging is best-effort, not guaranteed delivery.

     Kernel code can run in process context or interrupt context, and printk() behaves differently in each. In interrupt context there is no sleeping, limited buffering, and higher priority execution, while process context can be preempted or delayed. Because log messages are buffered and flushed asynchronously, preemption and concurrent CPUs can cause messages to interleave or appear reordered. During thermal events, GPU hangs, or PCIe link flaps, the system may be saturated with interrupts or stuck in a fault path, delaying or dropping log output entirely. This is why GPU driver logs are sometimes misleading during hard lockups: the failure happened, but the system never reached a safe point to emit or flush the messages that would have explained it.

     That's why I'm building kernel modules now. I'm targeting GPU-adjacent roles and need to understand thermal monitoring where it actually happens: kernel space. Today's module does nothing useful yet - just loads, prints to kernel log, unloads. But getting it working taught me something I somehow missed in 15 years of embedded work: why printk() exists at all.

     And before we get started let us ensure that we have our "target debug system" ready. I am on a Ubuntu 22.04 running on an Intel Core i7-9700F - 16GB Memory - NVIDIA GeForce RTX 2060 SUPER. So the instructions for Nvidia Edge compute(Orins and the Jetsons) might vary ever so slightly. Please check with official Nvidia documentation.

    # Install driver
    sudo ubuntu-drivers autoinstall
    sudo reboot
    
    # Verify GPU
    nvidia-smi
    
    # Install CUDA + kernel tools
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt update
    sudo apt install cuda-toolkit-12-6 build-essential linux-headers-$(uname -r)
    
    # Set paths
    echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
    source ~/.bashrc
    
    # Test
    nvcc --version
    

     

     And here is that code. In all its Guts and Glory. Well this is how it started atleast. 

     

    #include <linux/module.h>
    #include <linux/kernel.h>
    #include <linux/init.h>
    
    MODULE_LICENSE("GPL");
    MODULE_AUTHOR("Ananth");
    MODULE_DESCRIPTION("GPU Thermal Monitor");
    
    static int __init temp_init(void){
        printk(KERN_INFO "Tempmonitor: Module loaded\n");
        return 0;
    }
    
    static void __exit temp_exit(void){
        printk(KERN_INFO "Tempmonitor: Module unloaded\n");
    }
    
    module_init(temp_init);
    module_exit(temp_exit);
    

    In kernel space: no libc, no stdio, no printf(). The printk() call dumps to a ring buffer that feeds dmesg - which is where GPU drivers actually log. Spent 20 minutes confused why nothing printed until I realized I was checking terminal output instead of dmesg | tail. This is where real hardware debugging happens. GPU hangs, thermal throttling, PCIe issues - they all surface here first.

    This module is just the starting point. Next, I'll show how to expose kernel data to userspace via character devices, build a ring buffer for time-series data, and process it on GPU using CUDA. Along the way, you'll see why kernel-space logging matters for GPU thermal monitoring and field debugging.

    Next: character devices.


    Exposing Kernel Data to Userspace: Character Devices Explained

      I am a coffee snob and my FSE friend always complained about the bad coffee at the customer sites. Anyhow, like i was saying yesterday ker...