- >Intel Threading Building Blocks
- [download][tut from intel][Licensed under GPLv2 with the runtime exception]
Thursday, November 22, 2007
List of Libraries
Parallel Computing
Labels:
parallelprogramming,
resource
Sunday, November 18, 2007
Sunday, October 07, 2007
Links to External Resource Links
These are links to other link pages on related topics.
Parallel Computing
> Parallel Computing Research Groups by Jonathan Hardwick
> Parallel Processing at IEEE Computer Society
AI
> General AI by Alexander Nareyek
> AI on the web by Russell and Norvig
Planning
> AAAI Page on Planning and Scheduling
> Planning Systems and Testbeds by Rob St. Amant
> Planet List of Planners and Schedulers
Algorithms
> Algorithms and Implementations by NIST
Labels:
resource
Saturday, October 06, 2007
Artificial Intelligence Courses
> Artificial Intelligence II - Extracting, Managing & Personalizing Web Information by Dan Weld
> Intelligent Information Agents for the Web by Matthias Klusch
Courses in AI Planning
> Planning and Learning by Subbarao Kambhampati
> Agents Architectures and Multi-Agent Systems by Prof. Marie desJardins
> Intelligent Information Agents for the Web by Matthias Klusch
Courses in AI Planning
> Planning and Learning by Subbarao Kambhampati
- covers a lot of material from Automated Planning: Theory and Practice. Yochan research group.
- > Automated Planning by Dan Weld
- Context of application domains ranging from NASA testbeds to the semantic web. Temporal Graphplan planner
- > AI Planning by Jim Blythe, Jose-Luis Ambite, and Yolanda Gil.
- Interactive Knowledge Capture research group.
- > Planning in Stochastic, Dynamic, Metric & Incomplete Worlds by Subbarao Kambhampati
> Agents Architectures and Multi-Agent Systems by Prof. Marie desJardins
Labels:
aiplanning,
resource
Thursday, September 27, 2007
Herb Sutter's Talk on Machine Architecture
Herb Sutter says this is the first talk video for programmers that he recommended as "Must View" in 25 years. The video is titled
Know Your Enemy: Latency
or rather,
To summarize, the talk describes how important latency is in computation today - that an overwhelming fraction of effort, money and space of computing resources is dedicated to reducing latency. For example, 99% of transistors in the modern CPU are used to "hide" or reduce latency, and only 1% for computation. Much of the programming effort today also has to be dedicated to reduce latency. Indeed, latency is the root of nearly all complexity in hardware.
Latency Lags Bandwidth
Reduction in latency has not kept pace with increase in bandwidth. For example, in 20 years "CPU bandwidth" or computation speed has grown 2,250X but latency has decreased only 21X. This means the contribution of latency is always growing. Memory latency is now 210 cycles, instead of 1.4 cycles in 1980 - worse off 150X. Therefore, L1, L2 caches and registers have become much more important.
Our Programming Model Must Graduate
Our programming model must graduate from a obsolete CPU-Memory-Disk to include:
- CPU: CPUs, Registers, L1 Cache - Instruction (2 cycles) and Data (3 cycles), L2 Cache (14 cycles), Store Buffers
- Memory: DRAMs (200 cycles), Flash (3M cycles) and other nodes
- Disk: Flash, Hard Disk including compressed files (15M cycles), Network
Some numbers would be indicative of the scale of importance of changing the computation model - see figure here.
Little's Law says Concurrency = Bandwidth x Latency and is useful in applications in many areas of science. Therefore, we have to add concurrency everywhere to cope up with latency. This may even affect the correctness of my code, not just performance. Compilers have to attempt to go all the way in optimizing the code, that it breaks some implicit (incorrect) assumptions programmers often make.
In summary, to save the correctness of your code, Sutter says the following:
* Use locks to protect the shared variables.
* Avoid lock-free code since it is too difficult.
* Avoid fences or memory barriers.
There is a scenario when taking conditional locks may not be advisable since the compiler would rearrange the code so that some code might be executed without the condition. So pessimistically take locks even if updates are conditional and you believe you do not need a lock in a particular condition.
To exploit locality, the memory allocation, and memory access pattern become important.
Machine Architecture: Things Your Programming Language Never Told YouUnfortunately, the video is not the best quality and you would need the PDF slides of the presentation - available at Sutter's blog.
Know Your Enemy: Latency
or rather,
KNOW YOUR ENEMY
LATENCY
LATENCY
To summarize, the talk describes how important latency is in computation today - that an overwhelming fraction of effort, money and space of computing resources is dedicated to reducing latency. For example, 99% of transistors in the modern CPU are used to "hide" or reduce latency, and only 1% for computation. Much of the programming effort today also has to be dedicated to reduce latency. Indeed, latency is the root of nearly all complexity in hardware.
Latency Lags Bandwidth
Reduction in latency has not kept pace with increase in bandwidth. For example, in 20 years "CPU bandwidth" or computation speed has grown 2,250X but latency has decreased only 21X. This means the contribution of latency is always growing. Memory latency is now 210 cycles, instead of 1.4 cycles in 1980 - worse off 150X. Therefore, L1, L2 caches and registers have become much more important.
Our Programming Model Must Graduate
Our programming model must graduate from a obsolete CPU-Memory-Disk to include:
- CPU: CPUs, Registers, L1 Cache - Instruction (2 cycles) and Data (3 cycles), L2 Cache (14 cycles), Store Buffers
- Memory: DRAMs (200 cycles), Flash (3M cycles) and other nodes
- Disk: Flash, Hard Disk including compressed files (15M cycles), Network
Some numbers would be indicative of the scale of importance of changing the computation model - see figure here.
Little's Law says Concurrency = Bandwidth x Latency and is useful in applications in many areas of science. Therefore, we have to add concurrency everywhere to cope up with latency. This may even affect the correctness of my code, not just performance. Compilers have to attempt to go all the way in optimizing the code, that it breaks some implicit (incorrect) assumptions programmers often make.
In summary, to save the correctness of your code, Sutter says the following:
* Use locks to protect the shared variables.
* Avoid lock-free code since it is too difficult.
* Avoid fences or memory barriers.
There is a scenario when taking conditional locks may not be advisable since the compiler would rearrange the code so that some code might be executed without the condition. So pessimistically take locks even if updates are conditional and you believe you do not need a lock in a particular condition.
To exploit locality, the memory allocation, and memory access pattern become important.
Labels:
computerarchitecture,
parallelprogramming
Saturday, September 15, 2007
First Encounter with Intel's Threading Building Blocks
Herb Sutter mailed about the upcoming webinar series by Intel. The first in the series is a talk by Herb Sutter on September 25 on The Concurrency Revolution. Here is part of the abstract posted by Sutter:
The next speaker in the webinar series is James Reinders, on the topic "Steps to Parallelism NOW". He is also the author of "Intel® Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism". Chapter 1 of the book is available for download here (excerpt below in this article).
Intel® Threading Building Blocks
The Intel site says "Intel Threading Building Blocks" library has ready to use parallel algorithms, cross-platform support, task based parallelism, highly concurrent containers and generic algorithms. More on this here.
The library is open source and released under GPLv2 with the runtime exception - which means, it is as good as using gcc libstdc++ in your application, and I think (and please correct me if I'm wrong) commercial applications can use the library without restrictions and without releasing code as open source.
Some related notes on Threading Building Blocks:
Why Too Many Threads Hurts Performance, and What to do About It
This article discusses the impact of having too many threads.
The article gives an example of summing a tree. Under breadth-first execution traversal, parallelism is maximized but so is memory consumption because the number of tasks doubles at each level of execution - there are many more tasks than hardware parallelism supports. Using depth-first traversal reduces both parallelism and memory consumption. The queue for keeping tasks is a shared resource and may become a bottleneck due to communication overheads.
A modern method to solve these problems is to keep a separate queue for each thread, and "work depth-first, steal breadth-first". This means memory consumption is reduced, while there are just enough tasks to keep the hardware busy. This technique was used by MIT's Cilk language project.
Excerpts from Chapter 1 of "Intel Threading Building Blocks" by James Reinders
We are now seeing the initial stages of the next major change in software development, as over the next few years the software industry brings concurrency pervasively into mainstream software development, just as it has done in the past for objects, garbage collection, generics, and other technologies.
The next speaker in the webinar series is James Reinders, on the topic "Steps to Parallelism NOW". He is also the author of "Intel® Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism". Chapter 1 of the book is available for download here (excerpt below in this article).
Intel® Threading Building Blocks
The Intel site says "Intel Threading Building Blocks" library has ready to use parallel algorithms, cross-platform support, task based parallelism, highly concurrent containers and generic algorithms. More on this here.
The library is open source and released under GPLv2 with the runtime exception - which means, it is as good as using gcc libstdc++ in your application, and I think (and please correct me if I'm wrong) commercial applications can use the library without restrictions and without releasing code as open source.
Some related notes on Threading Building Blocks:
Why Too Many Threads Hurts Performance, and What to do About It
This article discusses the impact of having too many threads.
Partitioning a fixed amount of work among too many threads gives each thread too little work that the overhead of starting and terminating threads swamps the useful work.Since the most efficient number of compute threads depends upon the hardware, the division of program into threads is not a good way to do multithreaded programming. It is often better to do formulate the program into logical tasks, which are much more lightweight than logical threads. Task based programming also allows the distribution of tasks to processors unfairly to gain efficiency while giving up fairness. Task based programming is a lot easier since programmer is not forced to think about the low level parallelism. Scalability is easier to achieve using task based approach.
The article gives an example of summing a tree. Under breadth-first execution traversal, parallelism is maximized but so is memory consumption because the number of tasks doubles at each level of execution - there are many more tasks than hardware parallelism supports. Using depth-first traversal reduces both parallelism and memory consumption. The queue for keeping tasks is a shared resource and may become a bottleneck due to communication overheads.
A modern method to solve these problems is to keep a separate queue for each thread, and "work depth-first, steal breadth-first". This means memory consumption is reduced, while there are just enough tasks to keep the hardware busy. This technique was used by MIT's Cilk language project.
Excerpts from Chapter 1 of "Intel Threading Building Blocks" by James Reinders
When considering it for C programs, OpenMP has been referred to as “excellent for Fortran-style code written in C.” That is not an unreasonable description of OpenMP since it focuses on loop structures and C code. OpenMP offers nothing specific for C++. Looking to address weaknesses in OpenMP, Threading Building Blocks is designed for C++, and thus to provide the simplest possible solutions for the types of programs written in C++.
Labels:
c++programming,
parallelprogramming,
resource
Sunday, September 02, 2007
Hi!
[Please Note: This blog is no longer updated. Please visit www.cattywumpus.in for updated articles.]
Amit Kumar
PhD Student
Interactive Intelligence Lab
Electrical and Computer Engineering
National University of Singapore
email: amit_kumar at nus dot edu.sg
research | courses | resources | personal page | home
Welcome to my page. I intend to post most of my research-related work here. All the entries on this site are always under construction.
Labels:
aboutme
Subscribe to:
Posts (Atom)