Understand How a CPU Executes Instructions

Understand How a CPU Executes InstructionsScience & Technology

Last update 2 mo. agoCreated on the 23rd of March 2026

How Software Actually Runs

At the bottom of every abstraction layer — containers, virtual machines, operating systems, programming languages — is a CPU executing machine instructions. Understanding what happens during instruction execution at the hardware level explains performance characteristics that are otherwise mysterious: why a branch misprediction costs 15 cycles, why cache misses dominate runtime, and why Spectre-class vulnerabilities exist at all. These are not implementation details — they are the rules of the platform everything else runs on.

The Fetch-Decode-Execute Cycle

The program counter register holds the address of the next instruction to execute. On each cycle, the CPU fetches the bytes at that address from cache or memory, decodes the opcode to determine what operation and operands are involved, executes the operat

Pipelining Overlaps Instruction Stages

A pipelined CPU splits the fetch-decode-execute cycle into many independent stages — modern designs have 15 to 30 stages. While one instruction is in the execute stage, the next is in decode, and the one after that is being fetched. This means the through

Branch Prediction Keeps the Pipeline Full

A conditional jump (if-statement) creates a choice: the CPU does not know which path to take until the condition is evaluated, several stages into the pipeline. Rather than stall, the CPU guesses which branch will be taken using a branch predictor trained

Out-of-Order Execution Uses All Available Units

A modern CPU has multiple execution units: arithmetic, floating point, memory load, memory store. If two consecutive instructions are independent — neither uses the other's output — they can execute simultaneously on different units. The CPU tracks data d

Superscalar execution means a single CPU core can complete more than one instruction per clock cycle. A modern core might retire four or more instructions per cycle on code with sufficient parallelism. This is why compiler optimization flags matter — the compiler reorders and rearranges instructions specifically to expose parallelism that the CPU's out-of-order engine can exploit.

The Security Consequence of Speculative Execution

The CPU executes code past a branch before knowing whether that branch will be taken. If that speculative code accesses memory — even memory the program is not authorized to read — the access occurs in hardware and loads the data into the CPU cache. Even when the speculatively executed instructions are discarded, the cache state they created remains. An attacker can measure which cache lines were loaded by timing subsequent memory accesses. This is the Spectre vulnerability: it allows a program to read kernel memory or another process's memory by exploiting the CPU's performance optimization.

Go Deeper: Side-Channel Attacks

Speculative execution is the root of Spectre-class vulnerabilities — a category of attacks that extract secrets by measuring timing differences in cache state after speculative code ran. These attacks revealed that security and hardware performance are fundamentally in tension. The broader class they belong to — side-channel attacks — shows up in contexts far removed from CPUs, and understanding it changes how you think about implementing security.

The class of attacks that broke hardware security guarantees that looked airtight on paper.

slatesource.com

Agner Fog's CPU microarchitecture optimization guides

agner.org