Thursday, September 27, 2007

Herb Sutter's Talk on Machine Architecture

Herb Sutter says this is the first talk video for programmers that he recommended as "Must View" in 25 years. The video is titled

Machine Architecture: Things Your Programming Language Never Told You
Unfortunately, the video is not the best quality and you would need the PDF slides of the presentation - available at Sutter's blog.

Know Your Enemy: Latency

or rather,

KNOW YOUR ENEMY
LATENCY


To summarize, the talk describes how important latency is in computation today - that an overwhelming fraction of effort, money and space of computing resources is dedicated to reducing latency. For example, 99% of transistors in the modern CPU are used to "hide" or reduce latency, and only 1% for computation. Much of the programming effort today also has to be dedicated to reduce latency. Indeed, latency is the root of nearly all complexity in hardware.


Latency Lags Bandwidth

Reduction in latency has not kept pace with increase in bandwidth. For example, in 20 years "CPU bandwidth" or computation speed has grown 2,250X but latency has decreased only 21X. This means the contribution of latency is always growing. Memory latency is now 210 cycles, instead of 1.4 cycles in 1980 - worse off 150X. Therefore, L1, L2 caches and registers have become much more important.

Our Programming Model Must Graduate

Our programming model must graduate from a obsolete CPU-Memory-Disk to include:
- CPU: CPUs, Registers, L1 Cache - Instruction (2 cycles) and Data (3 cycles), L2 Cache (14 cycles), Store Buffers
- Memory: DRAMs (200 cycles), Flash (3M cycles) and other nodes
- Disk: Flash, Hard Disk including compressed files (15M cycles), Network
Some numbers would be indicative of the scale of importance of changing the computation model - see figure here.

Little's Law says Concurrency = Bandwidth x Latency and is useful in applications in many areas of science. Therefore, we have to add concurrency everywhere to cope up with latency. This may even affect the correctness of my code, not just performance. Compilers have to attempt to go all the way in optimizing the code, that it breaks some implicit (incorrect) assumptions programmers often make.

In summary, to save the correctness of your code, Sutter says the following:
* Use locks to protect the shared variables.
* Avoid lock-free code since it is too difficult.
* Avoid fences or memory barriers.
There is a scenario when taking conditional locks may not be advisable since the compiler would rearrange the code so that some code might be executed without the condition. So pessimistically take locks even if updates are conditional and you believe you do not need a lock in a particular condition.

To exploit locality, the memory allocation, and memory access pattern become important.