The Infosys Labs research blog tracks trends in technology with a focus on applied research in Information and Communication Technology (ICT)

« January 2012 | Main | March 2012 »

February 29, 2012

OpenCL Compiler from PGI for multicore ARM processors

Here is some great news for those looking for accelerating applications on the Android platform using OpenCL. Portland Group (PGI), has announced OpenCL framework for multicore ARM based processors. What this means is, we now have an OpenCL compiler for ARM based CPUs as a compute device in addition to existing ones for x86 CPUs and GPUs. With this announcement, PGI OpenCL becomes the first OpenCL compiler for Android targeting multicore ARM processors.

OpenCL being an open standard programming model for heterogonous processor systems, developers can now build portable multicore applications that can run across various mobile platforms using PGI's new framework. The initial release supports OpenCL 1.1 Embedded profile specification and is currently targeted at ST Ericsson Novathor ARM based processors. 

As specified by PGI, following core components comprise the PGI OpenCL framework:
1. PGI OpenCL device compiler--compiles OpenCL kernels for parallel execution on multi-core ARM processors
2. PGCL driver--a command-level driver for processing source files containing C99, C++ or OpenCL program units, including support for static compilation of OpenCL kernels
3. OpenCL host compilers--the PGCL driver uses the Android native development kit versions of gcc and g++ to compile OpenCL host code
4. OpenCL Platform Layer--a library of routines to query platform capabilities and create execution contexts from OpenCL host code
5. OpenCL Runtime Layer--a library of routines and an extensible runtime system used to set up and execute OpenCL kernels on multi-core ARM

More details on the framework on the PGI site here.

February 28, 2012

Is Parallel Computing a Rocket Science or Esoteric? Part 3

Having said a lot about the hardware evolution and intricacies in my previous posts(Part1 & Part2) that have influenced Parallel Computing, the question now to ask would be - Is Parallel Computing really a Rocket Science? Is Parallel Computing esoteric? The answer may be both yes and no. Bill Gates' Keynote in Supercomputing 05 Conference was titled "The Future of Computing in the Sciences"; the title seems apt, as Parallel Computing evolved mainly owing to the computational requirements for solving complex and advance computation problems in sciences that entailed high performance. This involved use of huge clusters and supercomputers. This class of computing is thus rightly named High Performance Computing (HPC). Understandably, these class of applications happened to be aimed at solving the toughest and convoluted problems of diverse sciences like astronomy, biology, mathematics and so on. Thus, owing to the complex nature and specialization entailment of these subjects, HPC seems to be esoteric here.

But due to the hardware technology advancement today; we have servers approaching teraflops speed thus the realization of "Supercomputer in your Desktop" may not be too far from reality in the near future. Desktops today have multicore-processors with languages that support porting of functionalities from legacy serial application to parallel. These parallel languages are powerful yet simple and intelligible to a novice programmer. So, it wouldn't be condescending to the power that Parallel Computing brings, to say that parallel programming is becoming easier. The problem however rests in migration of complex logic inherent to legacy application while porting from serial to parallel. Thus, owing to ever simpler paradigms for parallel programming PC is not a rocket science after all sans the HPC problems.

The future looks to be an extremely adventurous ride given the present technology tendencies. We are in times of shaping new horizons and touching upon new frontiers. Let's not ostracize Parallel Computing thinking it to be a rocket science and esoteric; let's embrace it with open arms because there always rests a middle path for us to choose.


February 24, 2012

IDC Top 10 HPC Market Predictions for 2012

International Data Corporation (IDC), the premier provider of market intelligence and advisory services has come up with the TOP 10 predictions for the HPC market for 2012. IDC's HPC team includes Earl Joseph, Steve Conway, Chirag DeKate, Lloyd Cohen, Beth Throckmorton, Charlie Hayes, and Mary Rolph. Their predictions offer insights into how the trends in HPC markets could drive the future changes and developments in this field.

1. The HPC Market Will Continue to Benefit from the Global Economic Recovery
2011 saw a HPC server revenue of about $10 billion. This is significant rebound to the pre-recession high point. The forecast from IDC for 2012 is that this will reach $10.6 billion and a projected revenue of $13.4 billion by 2015.

2. The Worldwide High End Race Will Accelerate
The geographic breadth and diversity has increased in the HPC vendor market. India, China, France, Italy, Russia and US are all now into this space. North America still leads the HPC server share with 45% and US vendors occupy 94% on the high-end ($500k/system) revenue. With the largest supercomputers now costing $100-$400 million, there will be increasing pressure from political circles to justify the ROI. (Did we hear Japan?)

3. Exascale Decisions Could Shift Future Leadership
IDC predicts nations that under-invest in exascale software will lose ground in the HPC market. Improvements and advances in software and tools for effective usage of HPC platforms will be more important than the hardware progress. Hence we are seeing increasing number of vendors entering into the software HPC market thereby driving commoditization. Maintaining optimal balance among performance, power consumption and reliability will continue to be a challenge for architecting HPC solutions

4. Software Leadership Will Become the New Battleground
Predominantly US has been leading in the HPC software sector. But others are sure catching up. European commission is making big investment plans for HPC software and hardware. Japan has plans of investing $35-$40 million for exascale software development.

5. The Processor Arena Will Become More Crowded
x86 processors remains the dominant HPC processor market with about 82% share. IBM Power also is a prominent player with 11%. But it's the accelerators like GPUs and FPGAs that are gaining enormous ground. 28% of HPC sites worldwide are now enabled with GPU acceleration. Low powered processors such as ARM have found a liking by lot of hardware vendors (such as Nvidia) to build heterogeneous processors. Challenge ofcourse though is providing the programmers with the right tools and software to build applications targeting these new hardware platforms.

6. National/Regional Technology Development Will Gain Momentum
The worldwide sentiment continues to remain that HPC technology is strategic and not preferable to be outsourced. Well, we know India built its own homegrown supercomputer back in 1980s because of the denial of access to HPC systems from abroad. Europe and Russia are all on the path to developing indigenous HPC technologies. A growing thought from all scientists and engineers is that a creditable HPC technology development can only happen if the environment is one of more choices for the users and avoiding protectionism.

7. Big Data Methods Will Start to Transform the HPC Market, Including Storage
Existing commercial Big Data vendors are understanding the importance of HPC and the 2 fields are colliding. There is enormous interest generating for Big Data applications built on HPC technologies. Storage revenue will continue to grow 2-3% faster than servers. Data transfer technologies are of utmost importance for HPC applications for performance considerations. Faster interconnects and improved memory design for minimized data movement are on the feature list of most of the hardware vendors today.

8. Cloud Computing Will Make Steady Progress
A fair bit of adoption is happening in the private cloud for HPC. But the same is not true for public cloud because of concerns on security, latency and pricing. But some of the suggested workloads where HPC public clouds can be adopted are those which do not have significant communication overheads. This includes pre-production R&D projects and those by Small and Medium business enterprises who cannot afford for large data centers. Early adopters of HPC on cloud have been Government sectors, Manufacturing industries, Bio-Life sciences, Oil and Gas and Financial companies.

9. There Will Be Shifting Sands in the Networking Market
Infiniband has had more momentum in the HPC interconnect market, but nevertheless Ethernet is poised to expand its share. The forecast for HPC interconnect market is $2 billion by 2014. As per IDC, for the proprietary interconnect market to grow, they will have to differentiate on top of the emerging and advanced standards to compete with Ethernet and Infiniband.

10. Petascale Performance on Big Systems Will Create New Business Opportunities
The advances happening in the HPC server, processor, storage and networking market have opened up opportunities for a wide class of business applications to benefit from it. Application software will benefit from the higher performance that can be derived from heterogeneous systems and are also power aware. Big Data methods will see wider applications. On the system software side, smarter compilers and runtime systems are possible. Efficient power management is another critical design goal that can be achieved.

February 22, 2012

Is Parallel Computing a Rocket Science or Esoteric? Part 2

The preceding part spoke about the advent and history of the realm of Parallel Computing. This part will further speak on the evolution of Parallel Computing and attempt to answer "Why Parallel Computing is so important all of a sudden today?"

In the last half century, Parallel Computing has evolved, if said, in a covert fashion without the renown that it has managed to amass in the last decade. We see so much of advancements in the Parallel Computing front today, which may seem overwhelming at times, but it is not that hard to discern as to why there has been an upsurge in this technology frontier.

So why so much of interest in Parallel Computing now, even when it existed 50 years ago? As Prof. John Kubatowicz says "Parallelism is Everywhere" - accounting for the fact that modern microprocessors have a billion transistor's rampant today even in handheld mobiles devices and clearly one must make them to work in parallel. Conversely, Parallel Computing is a trend today because it is forced upon us than our fancy. We have now hit the upper limit to making a single processor even faster in future than it is today given the present properties of the chip raw materials, say Silicon. Inadvertently, adding more than one processor to the chip die seems to be the only answer increasing the scope to program in parallel exponentially. Another reason for this astronomical boom is due to the advancement and evolution of computer hardware that has surfaced a new breed of processors into main stream computing than ancillary graphics processing, the Graphics Processing Unit(GPU).

The General-purpose computing on graphics processing units (GPGPU) that involved hoodwinking the GPU to do computation in past, has now evolved into GPU Computing that inherently caters to complex computation owing to changes in the GPU hardware facilitating parallel programming. Today we see the giant microprocessor manufacturers in a tryst to make their chips massively parallel. New parallel programming languages are finding mass appeal every day. Also a GPU is no more considered a subordinate; this outlook has led to a new phase in multiprocessor technology influencing the creation of Accelerated Processing Unit.

To be Continued... 

February 20, 2012

What is your "parallel" style?

Well, starting from where I left last, parallel computing is a complex business but then the computing industry is working to simplify this daunting task. There is a rich variety of solutions already in the shelves today - near-Auto Parallelizers, Accelerators, a wide collection of libraries and other high level abstractions. There are also the traditional low level support and programming languages that offer rich customizations and optimizations. This post gives a neat classification of various tools and programming abstractions in GPU Computing.

So it's really the need, as is the case for any decision, that drives the style of the parallel code. You could do it all yourself by custom coding or achieve parallelism by using some automatic means. You could be building a completely fresh parallel program or perhaps converting an existing serial program to a parallel one. The diagram below sort of puts some structure to help you decide your program style:

View image

I have only listed the most popular methods available today that caters to 2 different objectives. It is quite possible that the need is to achieve high performance by fine tuning the code to work very well on a particular parallel hardware. This will require intimate knowledge of the target hardware. This, as shown in the digram, is the Custom Parallelization path that, if done correctly, can yield high performance. This path will be opted by a mature, parallel computing programmer. Such a style will be most suitable for core scientific applications and frameworks or libraries, where it will essential to achieve the highest performance possible on a given hardware. On the other hand, it is possible that the intent is to achieve parallelization rapidly and a reasonable application performance is acceptable. This, as shown in the diagram, is Rapid Parallelization that yields accelerated parallelization with perhaps reasonable performance. The performance optimization is left to the software aid or accelerator used. I have categorized the available software accelerators, programming languages, libraries and other utilities that can help achieve, both, custom parallelization and rapid parallelization objectives. 

HPC is a fertile ground for research and I am sure that there are going to be rapid advancements in the next few years. Every vendor is aiming to make parallel programming simpler and this will spur HPC adoption greatly, especially in the enterprise world. At Infosys research too, we are building tools that will smoothen the steep learning curve of parallel computing and accelerate conversion of sequential applications to their parallel couterparts. We are also working on the software engineering aspects of building new HPC applications.

Hmm, I have already said too much. Just like the TV Sitcoms, that leave the audience each day with enough anticipation to watch the next episode, I will end this with this whiff of our work. In the meantime, you could think about what is your parallel programming style.

February 17, 2012

HPC tools classification

Gone are the days when there were only few parallel programming frameworks available like OpenMP, MPI and Threading Building Blocks. Now, with the advent of GPU computing and manycore architectures, there are a lot of High Performance Computing (HPC) languages and tools available which helps in speeding up our applications.

The HPC tools can be classified into 4 categories:

1. HPC Migration Language / HPC Language Extensions:

These HPC tools are basically extensions of existing languages such as C/C++. NVIDIA CUDA (Compute Unified Device Architecture) is one of them. For migration of an application using CUDA, the source code needs to be re-engineered; algorithm needs to be modified such that a large no. of GPU threads can be utilized to achieve the desired speed-up. OpenCL (Open Computing Language) is another such language extension. It is more low-level language than CUDA. The latest addition to this category is Microsoft C++ AMP.

2. Parallel Coding Assistant:

The tools in this category help us while coding in IDE like Microsoft Visual Studio. Intel Parallel Studio 2011 and Intel Parallel Studio XE 2011 can be integrated with Microsoft Visual Studio. They have been provided features which can help a programmer to analyze the code to find hotspots for adding parallelism, compose the source code by adding Intel Threading Building Blocks or Intel Cilk Plus constructs to exploit parallelism. They also have features to find memory errors or threading errors. The modified source code can be executed on any multi-core CPUs or Intel's Many Integrated Core (MIC) architectures.

3. Directives based Accelerator Models:

These are programming models which help a programmer to exploit parallelism by adding directives to potentially parallel portions of a sequential source code. The directives can be C pragma directives. PGI Accelerator Compiler from Portland Group Inc.(PGI) is based on such programming model. It also provides compiler feedback for the portions which could not be parallelized owing to dependency involved. For the computations to be performed on GPUs for getting desired speed up, the data needs to be copied to GPU device memory and then the result obtained has to be copied back to the CPU. This data transfer activity is taken care of by the PGI Accelerator compiler. The portion of the source code marked with parallel region gets executed directly on GPU device, thus accelerating the application.  HMPP from CAPS Enterprise is similar Accelerator model. OpenACC is an upcoming accelerator model which is being supported by Cray, CAPS Enterprise, NVIDIA and PGI.

4. Library assisting in HPC migration:

There are a lot of library available which makes GPU programming easier. Lot of common algorithms like reduction, scan, etc. which are required in case of GPU programming are available which are optimized for execution on GPU devices. CUBLAS, CUFFT and CURAND are such libraries which can be used on CUDA platform. Thrust is a library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL) that greatly enhances developer productivity. Libra SDK is a C++ programming API for creating high-performance applications. ArrayFire is a GPU software acceleration library.

The HPC tools mentioned above is not an exhaustive list. Through this blog, I have tried to classify the HPC tools and very briefly wrote about the tools. More details on HPC tools in my coming blogs.

Is Parallel Computing a Rocket Science or Esoteric? Part 1

My association with the field of High Performance Computing has been intriguing and a journey of revelations where in I have tried to understand the intricacies of a subject that has been long under the hoods. It just seems so recent that it has been rewarded with its much awaited 'Glory'.

I make here a humble attempt to bring to you my understanding of this so called "Dark Science" considered by many only a esoteric craft. I bring to you a 3 part series describing "The past, present and future of Parallel Computing through the eyes and experiences of a commoner"

You pour yourself a cup of hot brewing coffee and descend in a chair sipping on it, while reading your early morning dose of news. A common daily routine for each one of us, so what's so special? We seldom happen to appreciate that the trifling that surrounds us influences our broader picture of life immensely. The coffee with the newspaper was a classic example of multi-tasking or doing things in parallel. And this is exactly what we do in Parallel Computing aka Processing (PC); go about doing or trying to do computing simultaneously.

From ancient ruins dating say a 100BC which gave us some tablets and abacuses capable of doing computation in parallel to the many-core architecture cutting edge parallel computer architectures today, the journey has been intriguing and of a realization. Each milestone reached in this journey involved imbibing something from simple real life to make it a breakthrough in the technology world. For example, Prof. Dave Patterson's Laundry Example; outlining the principles of pipelining in parallel computer architecture. Goes to say is we all know Parallel Processing aka Computing, it's just that we never realized we did.

Though, the IBM 704 with its Principal Architect Gene Amdahl has been regarded as the first commercial breakthrough at creating a machine with floating-point hardware in 1955; Wikipedia tracks back the true origins of Parallel Computing (aka MIMD parallelism) to Federico Luigi, Conte Menabrea and his "Sketch of the Analytic Engine Invented by Charles Babbage" in 1842. This work by Luigi can be regarded as the first treatise describing many aspects of computer architecture and programming.

To be continued...


February 15, 2012

Hello Parallel Computing!

Computing applications are popularly serial in nature, meaning the program is designed to run on a single computer. Since they are run on a single machine, at any given instance there is just one programming instruction that is executed. Thus, clearly the instructions in a serial program are executed one at a time and one after the other, in the sequence in which they appear in the program. Given today's parallel infrastructure, it is a shame to run such serial programs on this powerful hardware. Clearly the hardware is quite underutilized. And this is exactly what parallel computing solves. To simply define, parallel computing allows more than one instruction to be concurrently executed and hence is able to use the dedicated computing resources effectively. So, for example, if a parallel program is run on a quad-core processor, it is possible to execute 4 simultaneous instructions at any given instance in time. A serial program, on the other hand, will only be able to utilize one of the 4 cores at any instant.
Now that I have (hopefully) your attention, let me pop a question. So is parallel computing as complex as it sounds to be? And, unfortunately, the answer is yes. It is a hard job to design and develop an error-free parallel program. Hard because thinking in parallel and designing parallel programs is not something most are trained for and also because this process takes a good amount of time.
In the world of HPC, parallel computing is a must in order to use computing resources like multi-core CPUs, GPUs and other accelerators. And, the program design varies from one computing resource to another.  It is because the computing resources can be classified to work best for a particular class of parallelism. On a high level, parallelism can be classified as Task Parallel and Data Parallel. To be able to design a well optimized parallel program, it will be essential to identify whether the problem is task or data parallel or is a mix of both. The next step will be to identify the appropriate computing resource. A data parallel program is well suited for a GPU. On the other hand, the CPU is best for task parallel programs. With the decision on the hardware made, comes the next step of actually designing the parallel program, developing this using the appropriate programming model and running this to check for correctness. This is a mammoth step and an important achievement. And then it comes to the star step, to quench the thirst for speed, it is essential to fine tune or optimize the program to achieve the much sought after speed up.
Things get really exciting in my side of the world, the HPC world. Keep watch to read about ways to solve the mysteries of parallel computing.

February 7, 2012

Microsoft C++ AMP is now 'Open'

A significant announcement was made by Microsoft last week regarding C++ AMP. The technology has now been made an Open Specification. This was announced at the GoingNative 2012 event. This means any C++ compiler developer can now come up with an implementation of these specifications and support the C++ AMP features on a wide array of heterogeneous hardware.

For those new to C++ Accelerated Massive Parallelism(AMP), it is a native progamming model that enables C++ code to be accelerated on data parallel hardware such as GPUs. The data parallel hardware is referred to as accelerator. Just as with CUDA, using C++ AMP, one can parallelize the data intensive portions of a program on the  accelerator and explicitly control to data communication between the CPU and accelerator. ‚Äč

Microsoft C++ AMP is part of Visual Studio 2011 release. C++ AMP Open specification can be found here.

February 1, 2012

High Performance Computing: An Overview

High Performance Computing (HPC) is today, one of the fastest growing fields in the IT industry. At a recent survey conducted by IDC, it asked companies across many industry verticals such as aerospace, energy, life sciences and financial services, about how they would be impacted without access to HPC. An interesting result from the survey is that, 47% of the companies said they could not exist as a business without HPC and another 34% said they will not be able to compete and would face issues in terms of cost and time to market in the absence of HPC. So this makes it obvious that HPC is one of the key technologies for every company to invest, in order to innovate and compete.

What is HPC? HPC is a system which brings 3 elements together. They are - Computers, Software and the right Expertise to utilize them. HPC technologies are used to solve problems which have traditionally known to be difficult to solve. When I say difficult problems, one could think of complex simulation problems encountered in financial systems for risk assessment of portfolios or in life sciences field for gene sequence matching or for the seismic imaging of earth's sub-surface by an oil services company. These are very time consuming applications and require enormous computation power. The need in such systems is to accelerate the execution by best utilization of the computer hardware resources on hand. This is where the modern day multi-core and many-core processors such as those from Intel, AMD and Nvidia come to aid by accelerating the compute and data intensive applications. With 8-16 cores on an Intel multi-core CPU to somewhere around 400-500 cores on an Nvidia GPU chip, there is enormous compute power that is available for use. Their processing power is in the order of GFLOPS to TFLOPS. In order to utilize such enormous compute power one will have to design their applications to effectively utilize the hardware resources. But most of the current mainstream applications and scientific applications have been developed using sequential algorithms and cannot effectively utilize the large computation power that is at disposal. The only way to make best use of this processing power is by porting or rewriting these applications using parallel algorithms. By multithreading the applications across the multiple cores of the processor, thereby dividing work, enormous speed-up can be achieved and benefits can be derived in terms of throughput and performance. This is the essence of HPC, which is to break problems into pieces and work on those pieces at the same time.

With large computing capacity processors available, we need the right set of technologies to harness them effectively. This is where the 'software' element of HPC that I mentioned earlier comes to the fore. There are vast set of HPC technologies to choose from. One could categorize them in multiple ways but following are the 2 categories that could be done based on the architectural differences of the underlying hardware. These are : multicore CPU technologies and many-core GPU technologies.
1.      Multicore Technologies: This includes all the software tools and libraries that help utilize the multicore CPU infrastructure efficiently. This class of technologies is suitable for compute intensive workloads where computation can be divided into individual tasks each performing a specific functionality. Some of the multicore technologies include Microsoft .Net 4 Parallel framework, Windows HPC Server 2008 and Intel's Parallel Studio. Windows HPC server is a cluster based solution that helps in dividing the work into many chunks and distributes the workload to all the compute machines in the cluster. Here the granularity of the processing can go upto the individual core of the processor.
2.      Many-core Technologies: Under this category, the most prominent and revolutionary technology is the "GPU Computing", which is widely used by scientific computing community and a lot of industry verticals. With hundreds of processing cores on a single chip, GPUs have vast potential for parallelism. GPU computing refers to using GPUs to perform general purpose computations instead of graphics/visualization purposes which they have traditionally been known for. This class of technologies is suitable for data intensive applications where computation has to be performed on large datasets and each data element could be processed in parallel. GPUs are more energy efficient than similar compute capacity CPUs which makes them more attractive for HPC applications because of the large data processing involved. Some of the important GPU technologies include Nvidia's CUDA, Microsoft's C++ AMP and OpenCL.

HPC today is an indispensable area for companies to gain competitive advantage, innovate and build products that was once considered solely the province of big science. The wide variety of tools available in the market is helping democratize HPC usage.

Subscribe to this blog's feed

Follow us on