Big Data and the God particle (Higgs Boson)
The July 4th 2012 announcement from CERN on the possible evidence of the existence of the God particle or the Higgs Boson has sent ripples through the physics community. This is not just fundamental to explain the existence of gravity but a validation of the Standard Model of particle physics. It holds the possibility of opening up new frontiers in physics and a new understanding of the world we live in.
While we marvel at these discoveries, our physicist brother-in grapple with trying to understand if this discovery is truly a Higgs Boson or an imposter? It is however very interesting to look at the magnitude of the data analysis and the distributed computing framework that was required to wade through the massive amounts of data produced by the Large Hadron Collider (LHC).
The Big Data problem that the scientists at
Right from the beginning CERN had set this up as a distributed cloud based tiered data processing solution. There were three tiers identified T0 being the tier that collects the data directly from LDH, there were 11 T1 nodes across the world getting the data from CERN and a number of T2 nodes (for e.g. there are 8 in the US) based on the areas of the data that particular groups of physicists were interested in analyzing. From the T2 nodes people could download the data to their personal T3 nodes for analysis. This resulted in a massive highly distributed data processing framework that collects data spewed out by the LHC detectors at a phenomenal rate of 1.25GB/sec. The overall network can rely on a computation capability of over 100,000 processors spread over 130 organization in 34 countries.
From a technology perspective it is interesting that people have used some of the open source technologies that we use for big data processing in enterprises for e.g. the file system with the Hadoop echo system, HDFS (Hadoop Distirbuted File System) was the candidate for storing these massive amounts of data, ROOT another open source tool which is also used by financial institutions as well is used to analyze this data.
It is amazing that the analysis tools used to find the God particle is commonly available to be used by the enterprise to solve smaller Big Data problems.
To paraphrase Galelio "All truths are easy to understand once they are discovered; the point is to discover them and Big Data can help"