The subject of this blog is the first of the two research talks that I would be presenting at the GPU Technology Conference in San Jose this week. The talk is titled "Fast Adaptive Sampling Technique For Multi-Dimenstional Integral Estimation Using GPUs".

Few numerical methods bring as much delight as Monte-Carlo integrations do to a HPC programmer. Even more so, when the platform is a GPU. Their relative ease of implementation coupled with the inherent parallel nature of these numerical methods and the knack with which they find solutions to some problems that are considered tough nuts to crack, place these methods at the top of a statistical programmer's toolbox. Be it pricing complex derivative products in Finance or be it in areas of modern physics such as Quantum Chromodynamics, often Monte-Carlo methods are the only ways by which a reasonable answer to the problem can be found out. However, there is no free lunch. Attractive Monte-Carlo integration is but its not hunky dory all the time. The same law of large numbers that underlies Monte-Carlo method's success is also sometimes the reason why these methods become computationally demanding and hence impractical. Frequently there arise scenarios in which the simulation just does not converge fast enough. Rephrased the number of samples required for the simulations to converge might just be prohibitive large for practical purposes. It is here that Variance Reduction techniques come to the rescue. Variance reduction techniques exploit the structure of the problem at hand and impart direction to what is otherwise a numerical method which is absolutely random and blindly so. VEGAS is one such variance reduction technique. It can be thought of as a hybrid of both Importance Sampling and Stratified Sampling. It's an adaptive technique in the sense that the algorithm iteratively identifies the right distribution of the function at hand and works towards generating random samples that closely mirror the distribution. VEGAS greatly improves the accuracy and speed with which Monte-Carlo integrations can be solved.

As an example consider the following diagram. The graph of the function that needs to be integrated is shown.

The algorithm proceeds in the following fashion.

1) The area in the limits of the integration is subdivided into equal sized blocks called bins. Such a set of blocks can be set up using a grid as shown in the figure above.

2) a large number of random samples are generated such that there are equal number of samples in each bin.

3) The integral is now evaluated for each bin of samples. Bins are now weighted by the contribution they make to the integral's value.

4) Using the weights obtained in the previous step the grid is now resized to reflect those weights. The grid resize ensures that there are more number of bins in the area that forms the meat of the function.

5) We go back to step 3.

6) Steps 3,4 and 5 are repeated until the necessary confidence interval is achieved.

Grid resizing is shown in the picture below.

The most straighforward of strategies for running this algorithm in parallel is to evaluate the integral at each of these points in parallel. The unbiased estimator that gives us the value of the integral can also be carried out in parallel using parallel reduction sum.

The iterative nature of the algorithm means that the task has to be carried out quite a number of times in succession until desired convergence criteria is met.

Since the GPU implementation and the optimizations are the primary subject of my talk on Wednesday, I will hold back on writing on those until that time. I will then re-edit this post and put in the details of the strategies we took to take advantage of the full power of the GPUs, the challenges we faced on the way and how we have overcome those.