Saturday, September 15, 2007

First Encounter with Intel's Threading Building Blocks

Herb Sutter mailed about the upcoming webinar series by Intel. The first in the series is a talk by Herb Sutter on September 25 on The Concurrency Revolution. Here is part of the abstract posted by Sutter:
We are now seeing the initial stages of the next major change in software development, as over the next few years the software industry brings concurrency pervasively into mainstream software development, just as it has done in the past for objects, garbage collection, generics, and other technologies.

The next speaker in the webinar series is James Reinders, on the topic "Steps to Parallelism NOW". He is also the author of "Intel® Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism". Chapter 1 of the book is available for download here (excerpt below in this article).

Intel® Threading Building Blocks

The Intel site says "Intel Threading Building Blocks" library has ready to use parallel algorithms, cross-platform support, task based parallelism, highly concurrent containers and generic algorithms. More on this here.

The library is open source and released under GPLv2 with the runtime exception - which means, it is as good as using gcc libstdc++ in your application, and I think (and please correct me if I'm wrong) commercial applications can use the library without restrictions and without releasing code as open source.

Some related notes on Threading Building Blocks:

Why Too Many Threads Hurts Performance, and What to do About It

This article discusses the impact of having too many threads.
Partitioning a fixed amount of work among too many threads gives each thread too little work that the overhead of starting and terminating threads swamps the useful work.
Since the most efficient number of compute threads depends upon the hardware, the division of program into threads is not a good way to do multithreaded programming. It is often better to do formulate the program into logical tasks, which are much more lightweight than logical threads. Task based programming also allows the distribution of tasks to processors unfairly to gain efficiency while giving up fairness. Task based programming is a lot easier since programmer is not forced to think about the low level parallelism. Scalability is easier to achieve using task based approach.

The article gives an example of summing a tree. Under breadth-first execution traversal, parallelism is maximized but so is memory consumption because the number of tasks doubles at each level of execution - there are many more tasks than hardware parallelism supports. Using depth-first traversal reduces both parallelism and memory consumption. The queue for keeping tasks is a shared resource and may become a bottleneck due to communication overheads.

A modern method to solve these problems is to keep a separate queue for each thread, and "work depth-first, steal breadth-first". This means memory consumption is reduced, while there are just enough tasks to keep the hardware busy. This technique was used by MIT's Cilk language project.

Excerpts from Chapter 1 of "Intel Threading Building Blocks" by James Reinders
When considering it for C programs, OpenMP has been referred to as “excellent for Fortran-style code written in C.” That is not an unreasonable description of OpenMP since it focuses on loop structures and C code. OpenMP offers nothing specific for C++. Looking to address weaknesses in OpenMP, Threading Building Blocks is designed for C++, and thus to provide the simplest possible solutions for the types of programs written in C++.