« Three Points of View on Big Data | Main | It's More Like a Diamond »

A Leap of Faith in Data-Driven Analytics

The new trends in building Digital Oil Field process simulation models are more about statistics than physical laws. This is a big change. Engineers historically prefer straightforward deterministic models using the tools that they are comfortable with as this is the way they were trained. But the industry is finding out that brute force approaches to complex problems, with a large degree of uncertainty in key parameters, is not providing the answers needed.

So in come the statisticians to save the day (maybe). The "black box" nature of proxy, or surrogate, models make an evaluation of the results often hard to understand. Once you find a correlation, what does it mean? Abstract attributes can be found by various statistical means but can you relate them to something in the physical world? Do you have confidence in the results of the statistical method? What is it is telling you to do? What do you do for the next problem that shows up? How much trust to you have in this new "data science"?

Here is the new value proposition - You don't need domain knowledge, just a lot of data. If you don't understand your data, don't worry. Forget the raw data, it's too complex (the curse of high dimensionality) and probably of poor quality (with enough data you can filter out the outliers).  Trust your learning algorithm, even though the abstract features that it finds may not have a discernible meaning in the physical world.  I am exaggerating to make a point here, but you can see what I mean about this being a very different approach but the statisticians say, the data doesn't' lie.

It is a strange feeling listening to a talk at an SPE meeting where the young graduate student begins his talk by saying he doesn't know anything about stuck pipe, or pore pressure, or hydraulic fracturing, but then proceeds with his paper solving complex problems about the subject matter he knows nothing about. It is a new world to say the least.

You have learned to trust the simulation and modeling wizards, even if you have forgotten the basic physics they use to forecast the future of your reservoir (can you remember Darcy's Law?). The data scientist enters the picture without knowing anything about the data collection or about the technical nature of the problem he or she is trying to solve. But with his or her magic toolbox of statistical methods (neural nets, CEP (complex event processing), artificial intelligence, auto-coders, machine learning, etc.) he or she attacks the data and outcomes correlations and conclusions that sometimes outperform the tried and true forecasts of the simulation wizards. Forget the visual basic and SQL programming you struggled to master now it is R and Python and Pig Latin languages that do the trick.

Several years ago, a young friend of mine was considering using her company's education benefits to improve to her resume. Instead of taking the "path well-trodden" and registering for an MS in petroleum engineering, she decided to get a statistics degree. Her career adviser advised against it. From his "old school" perspective, a statistics degree added nothing to an aspiring petroleum engineer. I knew she was smart then, but the last few years are proving she was inspired.

The industry has an interesting debate on its hands. With the advent of the digital oilfield, we have a surplus of data, sometimes an avalanche of data, and even with our improved high-performance computing capabilities, we can't cope with the volume, variety and velocity of the flow of data from operations. How do we wade through all the correlations to find the one that relates to causation? How many false positives does it take before we start to trust the results? 

My gut feel (heuristics) says that the answer lies in between these two extremes. Many problems are too complex to model through our limited understanding of physical laws. The power of statistical methods grows with more data and more different types of data about a process. I can't remember the math I took in college and I struggle to follow along with these new data scientists.

For the meantime, until we find the right balance, it is the SME (subject matter expert), versus the data scientist, versus the experienced old hand with his spreadsheet and intuition. Who are you going to trust? Does anyone want an aspirin, or a rabbit's foot, or a dart board?

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.