1. Caches

Caches

Because modern processor's cpu speed is increasing much faster than that of memory access, much of processing's delays comes from memory read/writes. To allay these delays, modern processors with multiple cores are always designed with multiple caches, designated L1 (private, small, fastest), L2 (private, larger, fast) and L3 (shared, slower).

Processor affinity

A context switch preserves and restores CPU state for a thread, but not its cache. It is preferably that when a thread gets a timeslice, it should run on the cpu core that it previously ran on- because its local cache might still have its old cached data. This is processor affinity, or cpu pinning.

Cacheline

A cache line is the smallest fixed-size block of data (typically 64 bytes on modern x86/ARM CPUs) that a CPU cache transfers between main memory (L3) and each core's private local caches.

Loading data at least 64 bytes at a time exploits data's spatial locality, meaning data is most likely to be located sequentially in memory. Loading an entire block from L3 (DRAM) takes 100-200 cycles whereas reading from L1 takes 3-5 cycles.

Cache coherence

However, a problem with caches is that the view of memory held by two different processors is through their individual caches, which, without any additional precautions, could end up seeing two different values. This is the cache coherence problem. This CC problem exists because there is a global state in the main memory (L3) and private local states (L1, L2).

In multi-core systems, cache lines are the unit of coherency under protocols like MESI. When one core updates data in a cache line, the MESI protocol invalidates copies of the same data in other cores, forcing them to refetch them on the next read.

MESI stands for modify, exclusive, shared, invalid.

To resolve this cache coherence problem, a common protocol is for the writing processor to acquire bus access (this sounds like a lock to me)- to broadcast the invalidated address. All processors continuously snoop on the bus. If an address appears that is in their cache, the processor invalidates the local copy of the data.

Cache coherence ensures that updates made by any one core to one unit of data are visible immediately to all cores.

Memory consistency deals with how requests are interleaved across multiple units of data.

2025-11