Understand Linux Process Scheduling

Last update 2 w. agoCreated on the 23rd of March 2026

Why Your Server Feels Sluggish Under Load

When a Linux system runs dozens of processes simultaneously but has only a few CPU cores, something has to decide who runs when. The Completely Fair Scheduler is that something. Its design goal is deceptively simple: every runnable task should receive an equal share of CPU time. How it achieves that guarantee — and what you can do to influence it — explains most of the scheduling-related performance behavior you will encounter on a self-hosted server.

How CFS Tracks Fairness with vruntime

Each task accumulates a virtual runtime counter called vruntime, which represents how much CPU time the task has consumed weighted by its priority. The task with the lowest vruntime is the one that has received the least fair share and is therefore the ne

The Run Queue Is a Red-Black Tree

CFS stores all runnable tasks in a red-black tree keyed by vruntime. The leftmost node — the task with the lowest vruntime — is cached directly and picked up in O(1) time. Inserting a waking task takes O(log n) time. This structure means scheduling remain

Adjust Priority with the nice Command

Nice values range from -20 (highest priority) to 19 (lowest priority). A lower nice value causes vruntime to accumulate more slowly, meaning the task stays near the leftmost position in the run tree and gets scheduled more often. Run a CPU-intensive backg

Control Container CPU with cgroups v2

Kubernetes resource limits and Docker CPU constraints map directly to cgroup v2 knobs. The CPU controller files live at /sys/fs/cgroup/system.slice/ and similar paths. cpu.max contains two numbers: the quota in microseconds and the period in microseconds.

Pin Processes to Cores with taskset

taskset sets or retrieves the CPU affinity mask of a process — the set of cores it is allowed to run on. Pin a latency-sensitive process to cores 0 and 1 with taskset -c 0,1 your-command. On a NUMA system, keeping a process on cores that share an L3 cache

Diagnose Scheduling Delays with perf sched

The perf sched latency command shows per-task scheduling statistics including average and maximum delay between when a task became runnable and when it actually ran. A task with high max scheduling latency was ready to run but waited behind other tasks. p

Go Deeper: The Red-Black Tree

The CFS run queue is a red-black tree — a self-balancing binary search tree that guarantees O(log n) operations regardless of load. That same data structure appears in contexts far beyond scheduling, and understanding it pays off. The Linux kernel uses red-black trees in the virtual memory subsystem, in the epoll event system, and in the TCP connection tracker. Knowing the invariants that keep the tree balanced reveals why these operations remain fast under adversarial workloads.

Understand How Containers Really WorkBy KaiRenner

CFS Scheduler Design

kernel.org