The commoditization of technology has reached its pinnacle with the advent of the recent paradigm of Cloud Computing. Infosys Cloud Computing blog is a platform to exchange thoughts, ideas and opinions with Infosys experts on Cloud Computing

« Can IT now can fully enable business? With Cloud? | Main | Cloud Computing Is Not All About Infrastructure »

Big Data and the God particle (Higgs Boson)

The July 4th 2012 announcement from CERN on the possible evidence of the existence of the God particle or the Higgs Boson has sent ripples through the physics community. This is not just fundamental to explain the existence of gravity but a validation of the Standard Model of particle physics. It holds the possibility of opening up new frontiers in physics and a new understanding of the world we live in.

While we marvel at these discoveries, our physicist brother-in grapple with trying to understand if this discovery is truly a Higgs Boson or an imposter?  It is however very interesting to look at the magnitude of the data analysis and the distributed computing framework that was required to wade through the massive amounts of data produced by the Large Hadron Collider (LHC).

The Big Data problem that the scientists at CERN and across the world had to contend with was sifting through over 800 Trillion (you heard that right ...) proton to proton collisions looking for this elusive Higgs Boson.  Additionally this particle has a large mass but is extremely unstable and lasts for less than a billionth of a second.  It cannot be detected directly and is identified through its footprint or the particles that it splits into. This results in over 200 Petabytes of data that needs to be analyzed.

 Right from the beginning CERN had set this up as a distributed cloud based tiered data processing solution. There were three tiers identified T0 being the tier that collects the data directly from LDH, there were 11 T1 nodes across the world getting the data from CERN and a number of T2 nodes (for e.g. there are 8 in the US) based on the areas of the data that particular groups of physicists were interested in analyzing.  From the T2 nodes people could download the data to their personal T3 nodes for analysis. This resulted in a massive highly distributed data processing framework that collects data spewed out by the LHC detectors at a phenomenal rate of 1.25GB/sec. The overall network can rely on a computation capability of over 100,000 processors spread over 130 organization in 34 countries.

From a technology perspective it is interesting that people have used some of the open source technologies that we use for big data processing in enterprises for e.g. the file system with the Hadoop echo system, HDFS (Hadoop Distirbuted File System) was the candidate for storing these massive amounts of data, ROOT another open source tool which is also used by financial institutions as well is used to analyze this data.

It is amazing that the analysis tools used to find the God particle is commonly available to be used by the enterprise to solve smaller Big Data problems.

To paraphrase Galelio "All truths are easy to understand once they are discovered; the point is to discover them and Big Data can help"

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.