Engineering High Performance Applications
Here it is a peek into the work that the Infosys HPC Research team is working excitedly on. They are studying efficient kernel composition techniques with an aim to deliver optimized application performance. To state simply, kernels are GPU programs. The HPC industry today is busy discovering ways to write optimized kernels but taking this to the next level would be to think how best I can build an application that is made of several well-optimized kernels. Can I simply bundle these highly optimized kernels to create a high performance application?
Component based design is well-researched and mature area and is aimed at reusability. Such a reusability concept can also be used to build applications in the HPC world. Using composition it is possible to build an application that is composed of kernels. These kernels perform a specific task and are highly optimized, making efficient use of the GPU. And since everyone knows that GPUs are used primarily for high performance, it becomes imperative that such a composition optimizes not just reusability but also optimizes performance.
Kernel developers characterize the performance of their kernels through its performance signature. The application designer combines these kernels with the objective that the performance of the refactored kernel is better than the sum of the performances of the individual kernels. But there is more to this than just putting these kernels together. What make this interesting and also difficult is that different kernels may make unbalanced use of different GPU resources like different types of memory. Kernels may also have the potential to share data. Refactoring the kernels, combining them and scheduling them suitably, improves performance. The research team has studied different types of potential design optimizations and has evaluated their effectiveness on different types of kernels.
The team shares that by applying their kernel composition techniques; they have observed that the composed application performance increases considerably as compared to just naively tying the kernels together.
Now, I think this is going to be very useful soon, when the focus shifts from developing isolated GPU programs to building applications that consume these individual high performing computation units. Going by the evolution of software engineering that began with writing small programs to the present day SOA, I am quite sure that HPC and parallel computing will gain enough momentum to propel software engineering methodologies for HPC. What say?