« July 2015 | Main | August 2016 »

August 26, 2015

It's More Like a Diamond

For the past thirty or so years, we've heard time and again about every project having three components: people, process, and technology. And further, this trio is represented as an equilateral triangle. I would like to put forward that not only is this concept incomplete, its representation is just not right. First of all, all three of these pieces are not equal. People are the most challenging part of any project and the most likely reason for a project's failure. The "not invented here" syndrome, or the "I'll keep my head down for a couple of months and this change will blow over" thought is pervasive in every organization. Process is critical, and it is now getting the attention it deserves. We can't improve the way we work if we don't analyze our current processes and look for opportunities to improve. And finally, technology. With all due respect to hardware and software developers everywhere, of the three parts of the triangle, the technology component is the least challenging. That's not to say the developing technology is not a lot of work and not difficult, but compared to the people and process components, unless a poor solution is selected, the technology part of the trio is least likely to be the reason for a project's failure, but on its own, also the least likely to be the reason for success.

But what about the incomplete aspect I mentioned earlier. This triangle is really a diamond (I could go with a square, but a diamond looks cooler). In addition to people, process, and technology, there's data (or information). This is always overlooked, but I think mainly because most I.T. people focus on the "T" and not the "I" (I have other conspiracy theories, but I won't go into them here).

Now, my reason for bringing this up is not because as a Data Management professional I'm feeling left out. It's really because of numerous conversations I've been involved with where companies are looking for that silver bullet... that software solution that will solve their problems... when in reality, the answer lies in their business processes and the data and information that flows within and between those processes (many of the failures in processes occur at the hand-off points between processes).


I would like to put forward two premises on which I'm basing my thoughts:
First, "Every decision the business makes is based on the data and information at its disposal". In the Production/Operations world of E&P, there is a business process where wells are monitored to ensure that the wells are operating safely and productively. If something is wrong, an alert is issued. But what is an alert. Simply put, an alert is a piece of data or information that falls outside an acceptable range of values, thereby causing an action on the part of an engineer or technician. In this case, let's break it down into our diamond:

People: Engineer or Technician

Process:  Monitor the well to ensure normal activity and take steps to correct when that activity falls outside of the norm

Technology: Could be anything from manually reading meters at a well head to sensors feeding information back to a real-time data center and control room

Data: The value being read, analyzed or interpreted that causes the actions to take place and the agreed upon range of acceptable values

The second premise, "My job is to help find, produce and sell oil and gas". If I were a Data Manager in Aerospace, my job would be to help design, produce and maintain aircraft. As a Data Manager in the energy sector, what I do day-to-day is support the first premise by ensuring that those people making the business decisions have all of the data and information necessary to do their jobs: how they need it, when they need it and where they need it. In order to do this, I need to understand the business processes, what data and information is needed at each decision point, what tools they're using to make their decisions, and what next process is triggered by that decision. And this goes on throughout the E&P value chain. One process leads to another. Each process needs data as an input; each process creates a piece of data or information (that piece of information may be simply a yes/no decision, but it still needs to be captured); each process triggers another process that goes through the same cycle.

In every process, there is a parallel yet often overlooked data flow upon which the process is totally dependent. So, the next time somebody brings up the people/process/technology concept, remind them that they're missing something and that something that they're missing will make or break their ultimate success.

August 3, 2015

A Leap of Faith in Data-Driven Analytics

The new trends in building Digital Oil Field process simulation models are more about statistics than physical laws. This is a big change. Engineers historically prefer straightforward deterministic models using the tools that they are comfortable with as this is the way they were trained. But the industry is finding out that brute force approaches to complex problems, with a large degree of uncertainty in key parameters, is not providing the answers needed.

So in come the statisticians to save the day (maybe). The "black box" nature of proxy, or surrogate, models make an evaluation of the results often hard to understand. Once you find a correlation, what does it mean? Abstract attributes can be found by various statistical means but can you relate them to something in the physical world? Do you have confidence in the results of the statistical method? What is it is telling you to do? What do you do for the next problem that shows up? How much trust to you have in this new "data science"?

Here is the new value proposition - You don't need domain knowledge, just a lot of data. If you don't understand your data, don't worry. Forget the raw data, it's too complex (the curse of high dimensionality) and probably of poor quality (with enough data you can filter out the outliers).  Trust your learning algorithm, even though the abstract features that it finds may not have a discernible meaning in the physical world.  I am exaggerating to make a point here, but you can see what I mean about this being a very different approach but the statisticians say, the data doesn't' lie.

It is a strange feeling listening to a talk at an SPE meeting where the young graduate student begins his talk by saying he doesn't know anything about stuck pipe, or pore pressure, or hydraulic fracturing, but then proceeds with his paper solving complex problems about the subject matter he knows nothing about. It is a new world to say the least.

You have learned to trust the simulation and modeling wizards, even if you have forgotten the basic physics they use to forecast the future of your reservoir (can you remember Darcy's Law?). The data scientist enters the picture without knowing anything about the data collection or about the technical nature of the problem he or she is trying to solve. But with his or her magic toolbox of statistical methods (neural nets, CEP (complex event processing), artificial intelligence, auto-coders, machine learning, etc.) he or she attacks the data and outcomes correlations and conclusions that sometimes outperform the tried and true forecasts of the simulation wizards. Forget the visual basic and SQL programming you struggled to master now it is R and Python and Pig Latin languages that do the trick.

Several years ago, a young friend of mine was considering using her company's education benefits to improve to her resume. Instead of taking the "path well-trodden" and registering for an MS in petroleum engineering, she decided to get a statistics degree. Her career adviser advised against it. From his "old school" perspective, a statistics degree added nothing to an aspiring petroleum engineer. I knew she was smart then, but the last few years are proving she was inspired.

The industry has an interesting debate on its hands. With the advent of the digital oilfield, we have a surplus of data, sometimes an avalanche of data, and even with our improved high-performance computing capabilities, we can't cope with the volume, variety and velocity of the flow of data from operations. How do we wade through all the correlations to find the one that relates to causation? How many false positives does it take before we start to trust the results? 

My gut feel (heuristics) says that the answer lies in between these two extremes. Many problems are too complex to model through our limited understanding of physical laws. The power of statistical methods grows with more data and more different types of data about a process. I can't remember the math I took in college and I struggle to follow along with these new data scientists.

For the meantime, until we find the right balance, it is the SME (subject matter expert), versus the data scientist, versus the experienced old hand with his spreadsheet and intuition. Who are you going to trust? Does anyone want an aspirin, or a rabbit's foot, or a dart board?