The Infosys Labs research blog tracks trends in technology with a focus on applied research in Information and Communication Technology (ICT)

May 15, 2012

Monte Carlo Integration On the GPU - VEGAS Algorithm

The subject of this blog is the first of the two research talks that I would be presenting at the GPU Technology Conference in San Jose this week. The talk is titled "Fast Adaptive Sampling Technique For Multi-Dimenstional Integral Estimation Using GPUs".

Few numerical methods bring as much delight as Monte-Carlo integrations do to a HPC programmer. Even more so, when the platform is a GPU. Their relative ease of implementation coupled with the inherent parallel nature of these numerical methods and the knack with which they find solutions to some problems that are considered tough nuts to crack, place these methods at the top of a statistical programmer's toolbox. Be it pricing complex derivative products in Finance or be it in areas of modern physics such as Quantum Chromodynamics, often Monte-Carlo methods are the only ways by which a reasonable answer to the problem can be found out. However, there is no free lunch. Attractive Monte-Carlo integration is but its not hunky dory all the time. The same law of large numbers that underlies Monte-Carlo method's success is also sometimes the reason why these methods become computationally demanding and hence impractical. Frequently there arise scenarios in which the simulation just does not converge fast enough. Rephrased the number of samples required for the simulations to converge might just be prohibitive large for practical purposes. It is here that Variance Reduction techniques come to the rescue. Variance reduction techniques exploit the structure of the problem at hand and impart direction to what is otherwise a numerical method which is absolutely random and blindly so. VEGAS is one such variance reduction technique. It can be thought of as a hybrid of both Importance Sampling and Stratified Sampling. It's an adaptive technique in the sense that the algorithm iteratively identifies the right distribution of the function at hand and works towards generating random samples that closely mirror the distribution. VEGAS greatly improves the accuracy and speed with which Monte-Carlo integrations can be solved.

As an example consider the following diagram. The graph of the function that needs to be integrated is shown. 

 

MCIntegration1.pngThe algorithm proceeds in the following fashion.

1) The area in the limits of the integration is subdivided into equal sized blocks called bins. Such a set of blocks can be set up using a grid as shown in the figure above.

2) a large number of random samples are generated such that there are equal number of samples in each bin.

3) The integral is now evaluated for each bin of samples. Bins are now weighted by the contribution they make to the integral's value.

4) Using the weights obtained in the previous step the grid is now resized to reflect those weights. The grid resize ensures that there are more number of bins in the area that forms the meat of the function.

5) We go back to step 3.

6) Steps 3,4 and 5 are repeated until the necessary confidence interval is achieved.

Grid resizing is shown in the picture below.

 

MCIntegration3.png

The most straighforward of strategies for running this algorithm in parallel is to evaluate the integral at each of these points in parallel. The unbiased estimator that gives us the value of the integral can also be carried out in parallel using parallel reduction sum.

The iterative nature of the algorithm means that the task has to be carried out quite a number of times in succession until desired convergence criteria is met.

Since the GPU implementation and the optimizations are the primary subject of my talk on Wednesday, I will hold back on writing on those until that time. I will then re-edit this post and put in the details of the strategies we took to take advantage of the full power of the GPUs, the challenges we faced on the way and how we have overcome those.

 

May 14, 2012

Infosys @ Nvidia GPU Technology Conference 2012

Hi There, 

I am super excited to tell you that I will be presenting some of the work that the High Performance Computing team @ Infosys has been doing using GPUs at the annual Nvidia GPU Technology Conference at the McEnery Convention Center in  San Jose, CA. While the conference itself kicks of in a few hours from now, the Infosys talks are scheduled on 16th, i.e. Wednesday.

The first talk is titled "Fast Adaptive Sampling Technique For Multi-Dimensional Integral Estimation Using GPUs". This is happening in Marriot Ball Room 3 at 2:30 PM.

The second talk is titled "GPU Based Stacking Sequence Optimization For Composite Skins Using GA". This talk is happening in Room K at 3 PM.

The subject of the first talk is an algorithm called VEGAS. VEGAS is a variance reduction technique that hastens convergence of a Monte-Carlo integration. This algorithm has wide applications from Computational Finance to High Energy Physics.

The subject of the second talk is a genetic algorithm that's at the heart of aircraft wing manufacturing. Modern aircraft wings are manufactured using composite materials. Sheets of these materials have to be overlaid on top of one another such that ability of the wing to sustain high stress in flight is maximized while at the same time minimizing violations of constraints that dictate what's an admissible ordering of the materials. 

I will elaborate on these short summaries of these two talks in subsequent blog posts over the next couple of days.

If you are going to be at GTC, kindly make it convenient to attend these talks. I will glad to meet you and tell you all the good work that we have been doing in the area of GPU computing in our labs and I would be equally excited to know about some of the coolest ways in which you are using GPUs too or else leave us a comment here on the blog. I will get back to you and we can engage in some geekery. 

Cheers...

April 25, 2012

Uncomplicating HPC using technology aids

Ok, so we understand that HPC is noteworthy. But if we said parallel computing is complex then achieving HPC is definetely no easy game. The industry offering to simplify HPC is growing and HPC cluster management software is an interesting technology that is doing its bit to ease HPC adoption. To put simply, HPC comes into play when there is typically a cluster of parallel hardware that needs to be used efficiently. And cluster management becomes crucial in order to effectively use and administer the cluster.


Amongst the key players in the HPC cluster management space is Microsoft with its Window HPC Server 2008. This incredibly user-friendly and powerful solution from Microsoft, comprises of a Job Scheduler, MPI support and cluster administration including monitoring facilities for a multicore environment. Built on Windows Server 2008 64-bit OS, HPC Server can efficiently scale to thousands of processing cores, efficiently scheduling jobs on the cluster and providing user-friendly console to monitor and manage this cluster. As a scheduler, it can efficiently schedule jobs by balancing the load based on one of these resources in the HPC cluster:
1) Node wise
2) Port wise
3) Core wise


HPC Server comes as a free add-on to Windows Server 2008 R2 and is very handy to easily bring in HPC for an embarrassingly parallel application that is aiming to leverage the full power of the underlying cluster. But wait, let me clarify. Sequential applications whose operations are embarrassingly parallel can be HPC-enabled by employing HPC Server. When there is an inherent parallelism in a sequential application, it is possible for it to effectively run on a HPC cluster with the help of HPC Server, without having to rewrite it to make it parallel. That's a treat, I must say. I hope you find this as awesome as I do.

April 23, 2012

Is Parallel Computing HPC?

Often times I use parallel computing and High Performance Computing rather loosely, interchanging the two and substituting one for the other. But for the purpose of clearly understanding both of these, it can be stated that if HPC were the end goal then parallel computing is the means. Parallel computing is independent of HPC meaning that the end goal of parallel computing need not be HPC. Parallel computing using supercomputers is typically what is called HPC. But with massively parallel hardware such as the GPUs available commonly, this definition seems to have been diluted a little and colloquially speaking parallel computing and HPC are not distinguished.


HPC is a growing and niche technology area and it is interesting to note that the U.S. government considers this an important technology that will help U.S. businesses, primarily manufacturing, to compete effectively by accelerating innovation. It is interesting to note that Ron Bloom, special assistant for manufacturing to the U.S. President, Mr. Obama, participated in a meeting, organized by the Council on Competitiveness Technology Leadership and Strategy Initiative advisory committee on HPC, to discuss how HPC can help U.S. manufactures to innovate and compete more effectively in the global market. It is with the same enthusiasm that other nations are looking to use HPC for innovation.
HPC needs are definitely growing and here are some of the key drivers for HPC:
• Reduce computation time - There are applications that are so complex that it takes a day to a week to get answers. With changing business dynamics, these applications, which enable key business decisions, would need to be tuned to produce their results in much lesser time for faster decision making. Despite the optimizations it wouldn't be possible to get higher application performance simply because these applications are sequential.
• Real time computations - It is becoming crucial for several core business applications to deliver real time or near real time results. This is simply not possible given the sequential nature of these applications.
• High throughput - Sometimes the need is to be able to get applications do much more within the same time window. Again, unless the application is adapted to parallel hardware it will simply not be possible to deliver high throughput.


HPC is slowly moving mainstream and is seeing adoption in the analytics and business intelligence space and planning and forecasting. As businesses target real time and near real time applications, HPC will become imperative.

April 3, 2012

Google SERP's new 'semantic' feature

Google quietly introduced an exciting feature recently on its SERP (Search Engine Result Pages). The feature is the 'Best Guess' feature. And wonderful it is because it quite appears like Question-answering on common knowledge and general relations.
Google can now tell you names of spouse (of celebrities of course), names of children, ceo's of company, birthday of someone; capital of a country.. and these are not wrapped in documents but picked out and put there as Best Guess on top of the SERP. This is very smart.

Example--
query - Director of Titanic
Best guess for Titanic Director is James Cameron
Mentioned on at least 8 websites including wikipedia.org, imdb.com and answers.com - Show sourcesHide sources - Feedback

children of barack obama
Best guess for Barack Obama Children is Natasha Obama, Malia Ann Obama
Mentioned on about.com - Show sourcesHide sources - Feedback

And it also tells you, where it picked up these guesses from... in the form of 'Mentioned on xx websites including a.com,b.com etc etc'...

This feature is looking quite exciting. It will certainly change the way people search the web and what they expect from the web search engines.

This feature has been around for about an year(and maybe more) but has not garnered a lot of heat yet; may be because the guesses google makes(so far), are really common knowledge and probably does not help an information seeker a lot. Or maybe because information seeker knows the website that gives him guaranteed infortion and does not follow the search engine route to get there. As of now, for me, its more of a play than a smart 'answering' mechanism. But, I am hopeful that this feature will be enriched and more elolved in future.

I put my little thought into how google must be doing this. I thought its doing extraction from socially trusted sources(like Wikipedia) and building a database of important relations. But this is a thought.

Comments welcome!

March 14, 2012

Interface testing

An enterprise application may comprise of software components and these components needs to interact with each other constantly.This is where an interface comes into picture, to facilitate the working of various modules as a single application.Performance testing is conducted to verify whether a system meets the performance criteria under varying workload.

                Based on the project experience, an overview of interface performance testing is mentioned below.The interface which was tested interacted with the Order Management System and the Employee Management System.Order details and updated order status of customer were transferred between the two systems.

The messages being sent between the systems were in xml file format.TIBCO queues were tested during this exercise.Messages from Order Management System were pushed to TIBCO queue and the queue consumed these messages.This process triggered an adapter service which in turn invoked web services call to update the Employee Management System.

                During load test, large volume of messages would be pushed to queue and the number of messages consumed and time taken for this would be monitored.We faced some issues during our interface testing, hence following points need to be taken care of:

-          Ensure queue receivers are up before test

-          Each XML Message should be in a single line.

-          Ensure latest deployments are done across the systems involved.

March 9, 2012

Engineering High Performance Applications

Here it is a peek into the work that the Infosys HPC Research team is working excitedly on.  They are studying efficient kernel composition techniques with an aim to deliver optimized application performance. To state simply, kernels are GPU programs. The HPC industry today is busy discovering ways to write optimized kernels but taking this to the next level would be to think how best I can build an application that is made of several well-optimized kernels. Can I simply bundle these highly optimized kernels to create a high performance application?


Component based design is well-researched and mature area and is aimed at reusability. Such a reusability concept can also be used to build applications in the HPC world. Using composition it is possible to build an application that is composed of kernels. These kernels perform a specific task and are highly optimized, making efficient use of the GPU. And since everyone knows that GPUs are used primarily for high performance, it becomes imperative that such a composition optimizes not just reusability but also optimizes performance.
Kernel developers characterize the performance of their kernels through its performance signature. The application designer combines these kernels with the objective that the performance of the refactored kernel is better than the sum of the performances of the individual kernels.  But there is more to this than just putting these kernels together.  What make this interesting and also difficult is that different kernels may make unbalanced use of different GPU resources like different types of memory. Kernels may also have the potential to share data. Refactoring the kernels, combining them and scheduling them suitably, improves performance. The research team has studied different types of potential design optimizations and has evaluated their effectiveness on different types of kernels.

The team shares that by applying their kernel composition techniques; they have observed that the composed application performance increases considerably as compared to just naively tying the kernels together. 


Now, I think this is going to be very useful soon, when the focus shifts from developing isolated GPU programs to building applications that consume these individual high performing computation units. Going by the evolution of software engineering that began with writing small programs to the present day SOA, I am quite sure that HPC and parallel computing will gain enough momentum to propel software engineering methodologies for HPC. What say?

February 29, 2012

OpenCL Compiler from PGI for multicore ARM processors

Here is some great news for those looking for accelerating applications on the Android platform using OpenCL. Portland Group (PGI), has announced OpenCL framework for multicore ARM based processors. What this means is, we now have an OpenCL compiler for ARM based CPUs as a compute device in addition to existing ones for x86 CPUs and GPUs. With this announcement, PGI OpenCL becomes the first OpenCL compiler for Android targeting multicore ARM processors.

OpenCL being an open standard programming model for heterogonous processor systems, developers can now build portable multicore applications that can run across various mobile platforms using PGI's new framework. The initial release supports OpenCL 1.1 Embedded profile specification and is currently targeted at ST Ericsson Novathor ARM based processors. 

As specified by PGI, following core components comprise the PGI OpenCL framework:
1. PGI OpenCL device compiler--compiles OpenCL kernels for parallel execution on multi-core ARM processors
2. PGCL driver--a command-level driver for processing source files containing C99, C++ or OpenCL program units, including support for static compilation of OpenCL kernels
3. OpenCL host compilers--the PGCL driver uses the Android native development kit versions of gcc and g++ to compile OpenCL host code
4. OpenCL Platform Layer--a library of routines to query platform capabilities and create execution contexts from OpenCL host code
5. OpenCL Runtime Layer--a library of routines and an extensible runtime system used to set up and execute OpenCL kernels on multi-core ARM

More details on the framework on the PGI site here.

February 28, 2012

Is Parallel Computing a Rocket Science or Esoteric? Part 3

Having said a lot about the hardware evolution and intricacies in my previous posts(Part1 & Part2) that have influenced Parallel Computing, the question now to ask would be - Is Parallel Computing really a Rocket Science? Is Parallel Computing esoteric? The answer may be both yes and no. Bill Gates' Keynote in Supercomputing 05 Conference was titled "The Future of Computing in the Sciences"; the title seems apt, as Parallel Computing evolved mainly owing to the computational requirements for solving complex and advance computation problems in sciences that entailed high performance. This involved use of huge clusters and supercomputers. This class of computing is thus rightly named High Performance Computing (HPC). Understandably, these class of applications happened to be aimed at solving the toughest and convoluted problems of diverse sciences like astronomy, biology, mathematics and so on. Thus, owing to the complex nature and specialization entailment of these subjects, HPC seems to be esoteric here.

But due to the hardware technology advancement today; we have servers approaching teraflops speed thus the realization of "Supercomputer in your Desktop" may not be too far from reality in the near future. Desktops today have multicore-processors with languages that support porting of functionalities from legacy serial application to parallel. These parallel languages are powerful yet simple and intelligible to a novice programmer. So, it wouldn't be condescending to the power that Parallel Computing brings, to say that parallel programming is becoming easier. The problem however rests in migration of complex logic inherent to legacy application while porting from serial to parallel. Thus, owing to ever simpler paradigms for parallel programming PC is not a rocket science after all sans the HPC problems.

The future looks to be an extremely adventurous ride given the present technology tendencies. We are in times of shaping new horizons and touching upon new frontiers. Let's not ostracize Parallel Computing thinking it to be a rocket science and esoteric; let's embrace it with open arms because there always rests a middle path for us to choose.

 

February 24, 2012

IDC Top 10 HPC Market Predictions for 2012

International Data Corporation (IDC), the premier provider of market intelligence and advisory services has come up with the TOP 10 predictions for the HPC market for 2012. IDC's HPC team includes Earl Joseph, Steve Conway, Chirag DeKate, Lloyd Cohen, Beth Throckmorton, Charlie Hayes, and Mary Rolph. Their predictions offer insights into how the trends in HPC markets could drive the future changes and developments in this field.


1. The HPC Market Will Continue to Benefit from the Global Economic Recovery
2011 saw a HPC server revenue of about $10 billion. This is significant rebound to the pre-recession high point. The forecast from IDC for 2012 is that this will reach $10.6 billion and a projected revenue of $13.4 billion by 2015.

2. The Worldwide High End Race Will Accelerate
The geographic breadth and diversity has increased in the HPC vendor market. India, China, France, Italy, Russia and US are all now into this space. North America still leads the HPC server share with 45% and US vendors occupy 94% on the high-end ($500k/system) revenue. With the largest supercomputers now costing $100-$400 million, there will be increasing pressure from political circles to justify the ROI. (Did we hear Japan?)

3. Exascale Decisions Could Shift Future Leadership
IDC predicts nations that under-invest in exascale software will lose ground in the HPC market. Improvements and advances in software and tools for effective usage of HPC platforms will be more important than the hardware progress. Hence we are seeing increasing number of vendors entering into the software HPC market thereby driving commoditization. Maintaining optimal balance among performance, power consumption and reliability will continue to be a challenge for architecting HPC solutions

4. Software Leadership Will Become the New Battleground
Predominantly US has been leading in the HPC software sector. But others are sure catching up. European commission is making big investment plans for HPC software and hardware. Japan has plans of investing $35-$40 million for exascale software development.

5. The Processor Arena Will Become More Crowded
x86 processors remains the dominant HPC processor market with about 82% share. IBM Power also is a prominent player with 11%. But it's the accelerators like GPUs and FPGAs that are gaining enormous ground. 28% of HPC sites worldwide are now enabled with GPU acceleration. Low powered processors such as ARM have found a liking by lot of hardware vendors (such as Nvidia) to build heterogeneous processors. Challenge ofcourse though is providing the programmers with the right tools and software to build applications targeting these new hardware platforms.

6. National/Regional Technology Development Will Gain Momentum
The worldwide sentiment continues to remain that HPC technology is strategic and not preferable to be outsourced. Well, we know India built its own homegrown supercomputer back in 1980s because of the denial of access to HPC systems from abroad. Europe and Russia are all on the path to developing indigenous HPC technologies. A growing thought from all scientists and engineers is that a creditable HPC technology development can only happen if the environment is one of more choices for the users and avoiding protectionism.

7. Big Data Methods Will Start to Transform the HPC Market, Including Storage
Existing commercial Big Data vendors are understanding the importance of HPC and the 2 fields are colliding. There is enormous interest generating for Big Data applications built on HPC technologies. Storage revenue will continue to grow 2-3% faster than servers. Data transfer technologies are of utmost importance for HPC applications for performance considerations. Faster interconnects and improved memory design for minimized data movement are on the feature list of most of the hardware vendors today.

8. Cloud Computing Will Make Steady Progress
A fair bit of adoption is happening in the private cloud for HPC. But the same is not true for public cloud because of concerns on security, latency and pricing. But some of the suggested workloads where HPC public clouds can be adopted are those which do not have significant communication overheads. This includes pre-production R&D projects and those by Small and Medium business enterprises who cannot afford for large data centers. Early adopters of HPC on cloud have been Government sectors, Manufacturing industries, Bio-Life sciences, Oil and Gas and Financial companies.

9. There Will Be Shifting Sands in the Networking Market
Infiniband has had more momentum in the HPC interconnect market, but nevertheless Ethernet is poised to expand its share. The forecast for HPC interconnect market is $2 billion by 2014. As per IDC, for the proprietary interconnect market to grow, they will have to differentiate on top of the emerging and advanced standards to compete with Ethernet and Infiniband.

10. Petascale Performance on Big Systems Will Create New Business Opportunities
The advances happening in the HPC server, processor, storage and networking market have opened up opportunities for a wide class of business applications to benefit from it. Application software will benefit from the higher performance that can be derived from heterogeneous systems and are also power aware. Big Data methods will see wider applications. On the system software side, smarter compilers and runtime systems are possible. Efficient power management is another critical design goal that can be achieved.