The Infosys Labs research blog tracks trends in technology with a focus on applied research in Information and Communication Technology (ICT)

« April 2012 | Main | June 2012 »

May 24, 2012

The 'search results page' makeover

There was a time, about 10 years ago, when all that a 'search results page' needed to show, to be impressive, was a list of links to web-pages. But the increasing user expectations and the motivation to provide an efficient way to reach information goal has led to many changes in the search results pages over the years.

The modifications started with snippets of matching texts appearing below each link and the matched words appeared in 'bold' letters. Later we later saw images, maps and videos making it to search results page. We also saw links to news, blogs and books on the page. Also came up were suggestions for related searches. .. all of this in an effort to make the search experience more efficient; the idea is that the user should need to spend much less time to get the information he/she is looking for.

With the industry focus on 'structured information' and semantic search; the search results page is undergoing further modifications. Recently Google blogged that they will soon show a 'Knowledge Graph' to enhance the google Search experience. At Infosys too, we do research and build solutions for engaging visualizations of information(structured as well as unstructured). 

 Our idea is to provide information not merely as documents; but in a structured form - like a table, list, graph etc. The appeal of such visualizations lies in the following -

  • They save the humans from having to read large texts (while also dealing with redundancy)
  • These are direct responses which can be consumed very easily by machines as well as humans
  • A picture is equivalent to a thousand words
  • These visualizations are extremely engaging and quite interactive.

 

And such visualizations would definitely define the next makeover of Search results page!!

"Semantic Search" - understanding what it is

We all use 'search engine' like Google to get what we are looking for. We type a keyword and get pointed to resources that contain mentions of that keyword; and they are very well ordered based on multiple relevant parameters. This is a typical "statistical" search...

 

1.       But, what if you had to search something like" Directors of Kate Winslet"?  A statistical search engine does not understand the information goal; but merely returns the results which match the keywords. The top results seen on a statistical search engine page are not about 'people who have directed Kate Winslet'; but about 'Kate Winslet's director husband'.. simply because the user keywords match these texts also. The other shortcoming is that the results are essentially web-pages which need to be manually ingested to find answers to information goal. Then there is also a repetition of information across web-pages.

 

Let me now introduce "Semantic Search". The literal meaning of the word 'semantics' is 'meaning'. And thus, 'Semantic search' translates to 'meaning based search' ..

 

Semantic search aims to address these issues by applying semantics or meaning in 2 places - by understanding the meaning of keyword itself and by understanding the searchable content. And it further integrates the information from the various content resources. Thus the outcome of same keyword will be more semantically relevant results and no repetition of results. Further the results will be presented in a more direct format; be it a table, a graph or a list. These structured formats are easily consumed by humans and machines alike.

 

What I explained just now is the eventual aim of semantic search; and the industry has made some substantial advances towards it. Lot of work is going on; major search engines are spending lot of money to semantify the search experience.. this can be seen in google's 'best guess'; abbreviation handing and synonym understanding features.

 

Its a very active and interesting space. We at Infosys Labs are excited about this and are doing various research and conceptualizing solutions for semantic search. Our research is around text analytics, semantic search and visualization. 

May 15, 2012

Monte Carlo Integration On the GPU - VEGAS Algorithm

The subject of this blog is the first of the two research talks that I would be presenting at the GPU Technology Conference in San Jose this week. The talk is titled "Fast Adaptive Sampling Technique For Multi-Dimenstional Integral Estimation Using GPUs".

Few numerical methods bring as much delight as Monte-Carlo integrations do to a HPC programmer. Even more so, when the platform is a GPU. Their relative ease of implementation coupled with the inherent parallel nature of these numerical methods and the knack with which they find solutions to some problems that are considered tough nuts to crack, place these methods at the top of a statistical programmer's toolbox. Be it pricing complex derivative products in Finance or be it in areas of modern physics such as Quantum Chromodynamics, often Monte-Carlo methods are the only ways by which a reasonable answer to the problem can be found out. However, there is no free lunch. Attractive Monte-Carlo integration is but its not hunky dory all the time. The same law of large numbers that underlies Monte-Carlo method's success is also sometimes the reason why these methods become computationally demanding and hence impractical. Frequently there arise scenarios in which the simulation just does not converge fast enough. Rephrased the number of samples required for the simulations to converge might just be prohibitive large for practical purposes. It is here that Variance Reduction techniques come to the rescue. Variance reduction techniques exploit the structure of the problem at hand and impart direction to what is otherwise a numerical method which is absolutely random and blindly so. VEGAS is one such variance reduction technique. It can be thought of as a hybrid of both Importance Sampling and Stratified Sampling. It's an adaptive technique in the sense that the algorithm iteratively identifies the right distribution of the function at hand and works towards generating random samples that closely mirror the distribution. VEGAS greatly improves the accuracy and speed with which Monte-Carlo integrations can be solved.

As an example consider the following diagram. The graph of the function that needs to be integrated is shown. 

 

MCIntegration1.pngThe algorithm proceeds in the following fashion.

1) The area in the limits of the integration is subdivided into equal sized blocks called bins. Such a set of blocks can be set up using a grid as shown in the figure above.

2) a large number of random samples are generated such that there are equal number of samples in each bin.

3) The integral is now evaluated for each bin of samples. Bins are now weighted by the contribution they make to the integral's value.

4) Using the weights obtained in the previous step the grid is now resized to reflect those weights. The grid resize ensures that there are more number of bins in the area that forms the meat of the function.

5) We go back to step 3.

6) Steps 3,4 and 5 are repeated until the necessary confidence interval is achieved.

Grid resizing is shown in the picture below.

 

MCIntegration3.png

The most straighforward of strategies for running this algorithm in parallel is to evaluate the integral at each of these points in parallel. The unbiased estimator that gives us the value of the integral can also be carried out in parallel using parallel reduction sum.

The iterative nature of the algorithm means that the task has to be carried out quite a number of times in succession until desired convergence criteria is met.

Since the GPU implementation and the optimizations are the primary subject of my talk on Wednesday, I will hold back on writing on those until that time. I will then re-edit this post and put in the details of the strategies we took to take advantage of the full power of the GPUs, the challenges we faced on the way and how we have overcome those.

 

May 14, 2012

Infosys @ Nvidia GPU Technology Conference 2012

Hi There, 

I am super excited to tell you that I will be presenting some of the work that the High Performance Computing team @ Infosys has been doing using GPUs at the annual Nvidia GPU Technology Conference at the McEnery Convention Center in  San Jose, CA. While the conference itself kicks of in a few hours from now, the Infosys talks are scheduled on 16th, i.e. Wednesday.

The first talk is titled "Fast Adaptive Sampling Technique For Multi-Dimensional Integral Estimation Using GPUs". This is happening in Marriot Ball Room 3 at 2:30 PM.

The second talk is titled "GPU Based Stacking Sequence Optimization For Composite Skins Using GA". This talk is happening in Room K at 3 PM.

The subject of the first talk is an algorithm called VEGAS. VEGAS is a variance reduction technique that hastens convergence of a Monte-Carlo integration. This algorithm has wide applications from Computational Finance to High Energy Physics.

The subject of the second talk is a genetic algorithm that's at the heart of aircraft wing manufacturing. Modern aircraft wings are manufactured using composite materials. Sheets of these materials have to be overlaid on top of one another such that ability of the wing to sustain high stress in flight is maximized while at the same time minimizing violations of constraints that dictate what's an admissible ordering of the materials. 

I will elaborate on these short summaries of these two talks in subsequent blog posts over the next couple of days.

If you are going to be at GTC, kindly make it convenient to attend these talks. I will glad to meet you and tell you all the good work that we have been doing in the area of GPU computing in our labs and I would be equally excited to know about some of the coolest ways in which you are using GPUs too or else leave us a comment here on the blog. I will get back to you and we can engage in some geekery. 

Cheers...