Showing posts with label parallelprogramming. Show all posts
Showing posts with label parallelprogramming. Show all posts

Monday, April 28, 2008

Course on Real Time Systems - EE5903

Real Time Systems (RTS) are the systems that must respond to changes in environment within a specific interval of time. The course on RTS was an interesting but rather heavy, with a relatively large project, covering several topics and reading research papers. Some concepts I find one of most difficult in computer science like synchronization and mutual exclusion were a part of it.

The following major topics were covered:
  • Process Management, Synchronization, Concurrency (handout from Operating System Principles by Bic and Shaw)
  • Distributed RTS, Deadlock Management, Distributed Deadlocks (papers on HARTS, Token Passing for mutual exclusion)
  • RTS scheduling (paper on Rate Monotonic Scheduling by Lui and Layland), RT Operating System Kernel (paper on Spring Architecture)
  • RTS system development diagramming - DARTS, UML statechart (paper on Software Design by Gomaa)
We did a distributed game design and development project. In design phase, we reviewed several architectures for LAN based game development (see the report here). We then designed a flexible client-server load architecture. We implemented a prototype for this architecture in the second part of the project (see the report here).

Some ideas for future work:
  • Development of a flexible-load distributed game by continuing the work done in project.
  • Multithreading the Box2D library - this is an open source physics library we used in our project.

Saturday, January 19, 2008

An Essential Note on Parallel Programming

Newer machines will not execute the conventional applications faster. This means the programmers' free lunch is over. Programmers will need to change their ways - and this change is revolutionary rather than evolutionary - sequential programming which has been the most popular way to program may not scale in the future and has to be replaced by much more complex parallel programming. There is a possibility of an impending crisis in computing industry if the industry does not in time embrace parallel scalable programming.

See Essay by Dave Patterson: Parallel or Bust: Computing at a Crossroads for more. Follow up with View from Berkeley. A news article: Intel and Microsoft donate $20 Million to Two Universities for Multicore Research. Here is a presentation by David Patterson on What Future Apps May Be, How Research and Industry Can Work Together on Multicore Research and What is Possible.

My Research

I am interested in artificial intelligence, automated planning, parallel computing, multi-agent systems and optimization.

Right now I am working in a central field of Aritificial Intelligence called Automated Planning. Specifically, I am looking for ways to scale Planning on modern computers. In our lab, we also apply our Planning technology to story planning in games. I was previously a software developer and in the future want to apply AI as an entrepreneur. Only time can tell what I will do afterwards.

A Simple Overview of Artificial Intelligence

Artificial Intelligence is applied not only to robotics, but also corporate decision-making, building large-scale systems, operations research, medicine, computer games and many other fields. If mathematics is the queen of sciences, AI must be the queen of computer sciences. More on this soon ...

My Team's Research on Automated Planning

Automated Planning studies how an intelligent agent can combine actions in order to reach specified goals. An intelligent agent with ability to plan can "think ahead". Automated Planning systems have been used where human involvement is difficult or partial (like space missions, elderly assistance), or problems are quite complex (like composition of web-services) or little time is available before execution (like computer gaming, emergency evacuation). While previously most researchers studied planning for simplified toy problems, nowadays many researchers study how planning can be used in the real world.
In our lab, we are building a real-time, responsive AI Planning system (see: What is Planning?) affectionately called Crackpot. Our lab is headed by my supervisor Alexander Nareyek. Another team in our lab will use Crackpot for automated experience/story generation for games/training.

My Research

Modern computers are different from computers in the 20th century. They implement a large degree of parallelism; the parallelism is increasing every year. Software systems must be designed and implemented differently to run on these parallel computers. We study how real-time AI Planning can be implemented to effectively utilize the resources of these parallel computers.

Some Ideas on Entrepreurship

AI is much more in the books and labs than in the real-world applications. Decision Support, Case-Based Reasoning and Expert Systems have been widely used previously. However, planning and scheduling are largely unexplored areas till now.

Some Ideas to Stumble Upon Further in the Future

Agent-Based Simulations looks quite interesting. Large scale simulations of human/human-like agent communities can be carried out for purposes of hypothesis testing and prediction.

Thursday, November 22, 2007

Thursday, September 27, 2007

Herb Sutter's Talk on Machine Architecture

Herb Sutter says this is the first talk video for programmers that he recommended as "Must View" in 25 years. The video is titled

Machine Architecture: Things Your Programming Language Never Told You
Unfortunately, the video is not the best quality and you would need the PDF slides of the presentation - available at Sutter's blog.

Know Your Enemy: Latency

or rather,

KNOW YOUR ENEMY
LATENCY


To summarize, the talk describes how important latency is in computation today - that an overwhelming fraction of effort, money and space of computing resources is dedicated to reducing latency. For example, 99% of transistors in the modern CPU are used to "hide" or reduce latency, and only 1% for computation. Much of the programming effort today also has to be dedicated to reduce latency. Indeed, latency is the root of nearly all complexity in hardware.


Latency Lags Bandwidth

Reduction in latency has not kept pace with increase in bandwidth. For example, in 20 years "CPU bandwidth" or computation speed has grown 2,250X but latency has decreased only 21X. This means the contribution of latency is always growing. Memory latency is now 210 cycles, instead of 1.4 cycles in 1980 - worse off 150X. Therefore, L1, L2 caches and registers have become much more important.

Our Programming Model Must Graduate

Our programming model must graduate from a obsolete CPU-Memory-Disk to include:
- CPU: CPUs, Registers, L1 Cache - Instruction (2 cycles) and Data (3 cycles), L2 Cache (14 cycles), Store Buffers
- Memory: DRAMs (200 cycles), Flash (3M cycles) and other nodes
- Disk: Flash, Hard Disk including compressed files (15M cycles), Network
Some numbers would be indicative of the scale of importance of changing the computation model - see figure here.

Little's Law says Concurrency = Bandwidth x Latency and is useful in applications in many areas of science. Therefore, we have to add concurrency everywhere to cope up with latency. This may even affect the correctness of my code, not just performance. Compilers have to attempt to go all the way in optimizing the code, that it breaks some implicit (incorrect) assumptions programmers often make.

In summary, to save the correctness of your code, Sutter says the following:
* Use locks to protect the shared variables.
* Avoid lock-free code since it is too difficult.
* Avoid fences or memory barriers.
There is a scenario when taking conditional locks may not be advisable since the compiler would rearrange the code so that some code might be executed without the condition. So pessimistically take locks even if updates are conditional and you believe you do not need a lock in a particular condition.

To exploit locality, the memory allocation, and memory access pattern become important.

Saturday, September 15, 2007

First Encounter with Intel's Threading Building Blocks

Herb Sutter mailed about the upcoming webinar series by Intel. The first in the series is a talk by Herb Sutter on September 25 on The Concurrency Revolution. Here is part of the abstract posted by Sutter:
We are now seeing the initial stages of the next major change in software development, as over the next few years the software industry brings concurrency pervasively into mainstream software development, just as it has done in the past for objects, garbage collection, generics, and other technologies.

The next speaker in the webinar series is James Reinders, on the topic "Steps to Parallelism NOW". He is also the author of "Intel® Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism". Chapter 1 of the book is available for download here (excerpt below in this article).

Intel® Threading Building Blocks

The Intel site says "Intel Threading Building Blocks" library has ready to use parallel algorithms, cross-platform support, task based parallelism, highly concurrent containers and generic algorithms. More on this here.

The library is open source and released under GPLv2 with the runtime exception - which means, it is as good as using gcc libstdc++ in your application, and I think (and please correct me if I'm wrong) commercial applications can use the library without restrictions and without releasing code as open source.

Some related notes on Threading Building Blocks:

Why Too Many Threads Hurts Performance, and What to do About It

This article discusses the impact of having too many threads.
Partitioning a fixed amount of work among too many threads gives each thread too little work that the overhead of starting and terminating threads swamps the useful work.
Since the most efficient number of compute threads depends upon the hardware, the division of program into threads is not a good way to do multithreaded programming. It is often better to do formulate the program into logical tasks, which are much more lightweight than logical threads. Task based programming also allows the distribution of tasks to processors unfairly to gain efficiency while giving up fairness. Task based programming is a lot easier since programmer is not forced to think about the low level parallelism. Scalability is easier to achieve using task based approach.

The article gives an example of summing a tree. Under breadth-first execution traversal, parallelism is maximized but so is memory consumption because the number of tasks doubles at each level of execution - there are many more tasks than hardware parallelism supports. Using depth-first traversal reduces both parallelism and memory consumption. The queue for keeping tasks is a shared resource and may become a bottleneck due to communication overheads.

A modern method to solve these problems is to keep a separate queue for each thread, and "work depth-first, steal breadth-first". This means memory consumption is reduced, while there are just enough tasks to keep the hardware busy. This technique was used by MIT's Cilk language project.

Excerpts from Chapter 1 of "Intel Threading Building Blocks" by James Reinders
When considering it for C programs, OpenMP has been referred to as “excellent for Fortran-style code written in C.” That is not an unreasonable description of OpenMP since it focuses on loop structures and C code. OpenMP offers nothing specific for C++. Looking to address weaknesses in OpenMP, Threading Building Blocks is designed for C++, and thus to provide the simplest possible solutions for the types of programs written in C++.