« June 2017 | Main | December 2017 »

August 22, 2017

Slow Motion Data Crisis

Most of us respond to an alarm pretty quickly. A smoke alarm in your house, fire alarm in a mall, lightning alarm on the soccer field, or a tornado alarm or siren. We take these alarms seriously and get everyone to a safe place as fast as we can.

There are, however, other kinds of alarms we tend to ignore allowing a crisis to creep up on us. Let's call these slow-motion crises. These work along the same premise as the old Cajun story of how to cook a frog. The best way to cook one is to put the frog in a pot of cool water and turn up the temperature slowly so that the frog gets used to the rising temperature until it's too late. Turn the temperature up too quickly and the frog realizes his situation and jumps out. Sadly, we are the frog in this metaphor.

Businesses have alarms for almost everything. Operations boasts the ones we ordinarily think of - high pressure, low pressure, or failed systems alerts triggered by SCADA systems. However, one could consider the corporate monthly, or even weekly, management dashboard reports, where there are alarms for key metrics that impact the profitability of your business. Field personnel and corporate executives take these alarms seriously and do something to get to a safe (or profitable) place as fast as they can.

Slow-motion crises happen when phenomena or processes have potentially substantial or even devastating consequences that build and become problematic only after the passage of considerable time. They are unlike the fire alarm or the failed pressure gauge alarm. If this article was about geopolitics, I might quote the national debt, or falling education scores, or climate change as examples of slow-motion crises. But since we're discussing information management, and I think this is the way many, if not most companies treat data management, it is a classic example of a slow-motion crisis.

I don't think many companies have a key performance indicator (KPI) for data on their management dashboards. It is difficult to even know what the right trends in data management are that deserve management attention. Is it growth in data volume? Is it something around data quality? Would it be something about user productivity in finding the right data? Or having data stewards for every critical information object or effective data governance councils? For companies focused on data-driven decisions, would it be "time to insight" or a change in decision quality for key investment decisions? Would it even be an estimate of the value of the information that your company owns?

From my experience, management, even the IT leadership teams, doesn't monitor any of these trends on a regular basis. They may look at data management periodically, but there are no alarms to call them to action. We slowly get used to the growing volume, variety and velocity of data being generated by our operations and being imported from external sources. We know that it takes our analysts and data scientists too long to find, check and reformat the data needed for their reports and models, but we are getting used to that as well.

We know that there is a lot of data in "shadow IT systems" that never finds its way to our official "systems of record", but we tried to tackle that problem and didn't make much progress. We know the data quality for some key information is not as good as we would like, but when we really need that data, we spend the time to make corrections before we analyze and use it. We are just like the frog and we are getting used to the rising temperature of the business demand for better access to data while trying to improve their business processes and operate more efficiently.

There are alternatives to this situation besides just ignoring the alarms. Likely, the answer dearest to our heart is to step back and focus on data management, update our information management strategies, focus on standard definitions and standard ways of exchanging data with partners and suppliers. And we can work with the business to establish effective data governance teams and recognize the value of good data stewards.

Also, we can benchmark our information management practices with our industry peers and look at what they are doing. We can bring in new technology, fill data lakes stocked with data catalogs to make data management a little easier for both the data managers and the data analysts. We can work with consultants and technology vendors to evaluate our current situation, get a second opinion on the data and integration architecture we are using and build a road map for a practical pace of improvement in data management practices and capabilities. Any of these steps helps to put the state of your data management practices in focus against the rising temperature of Big Data trends.

I realize this isn't the really exciting work that emerging technology offers and it isn't the highest priority when budgets are squeezed. And it is rarely a frequently asked question when we meet with business leaders or attend high tech conferences full of high-priced thought leaders. Whether the consequences kick in gradually (when productivity grinds to a halt) or hits us suddenly (when a critical decision is missed because a project couldn't find the right data to see the alternatives), the slow-motion crisis of the inadequate response to Big Data trends will hit us. The water is already heating up.

What temperature will it take in your organization before you react?


Jim Crompton is a thought leader for Noah Consulting, an Infosys Company, helping pioneer the relationships between complex Upstream processes and enterprises with automation to create competitive advantage.  His experience over numerous decades combined with the development capability of Infosys is working to ensure successful alignment of man and machine.

August 2, 2017

Imagine Shopping for Your Data...

People enjoy the convenience of shopping online, which is more than you can say for most employees of large companies trying to find the data they need to do their jobs. Maybe what companies need is an online data marketplace? Can shopping at Amazon provide us some lessons on how to manage the Big Data environments we have and show us how to give an enjoyable "data" shopping experience for our employees?

I think I can safely assume that most people reading this article have experienced shopping on Amazon.com. Whether you were looking for a good book or something else, Amazon is a popular "place" (if I can use that term). Amazon began as a virtual book store and diversified, expanded and disrupted retail sales channels.  Visitors to the Amazon site will account for about 7% of North American retail sales in 2018, (versus 10.6% for Walmart - the largest brick-and-mortar retailer). Part of this can be attributed to 1.8 million items Amazon offers vs Walmart SuperCenter range of 120,000 average. Amazon also produces consumer electronics and cloud infrastructure services (IaaS and PaaS) and recently purchased Whole Foods (for $13.4 billion) to expand their grocery business.

How does shopping online correlate to data management and more specifically towards the challenge of data access? Every enterprise seems to have an established data lake (or more than one) to go along with their data warehouse and a host of specific data marts, systems of record for structured data, enterprise content management systems for documents and many other data sources. Whether they call their strategy a data foundation or a data ecosystem, the amount and variety of data is growing, but the amount of time spent just trying to find what you are looking for, verifying that the data quality is acceptable and the data comes from the right source (data governance), seems to take up most of your day.

Organizations looking to modernize their analytics and data infrastructure to enable both data science teams and self-service for the average employee are running into challenges with islands of data and heterogeneous technology landscapes. This leads to perpetual data integration efforts, making Analytics difficult and restricted to a very few with the time and skills to mine the large data collections available. The vast majority of organizations building Data Lakes are struggling to unlock the maximum value in their Data. The efforts to deploy this new technology are falling short due to multiple barriers including:

  • Use-ability barriers: users don't know what is in the data lake. They need a guide to help them navigate their NoSQL environment
  • Access barriers: most companies have a heterogenous technology landscape and need data access across the board (horizontal as well as vertical) for their Analytics workloads
  • Technology barriers: users don't have the required tools to get value out of the data and enable self-service. Just deploying Hadoop is not enough
  • Skills barriers: hiring data scientists is difficult and expensive and not every problem requires a specialist and not everyone has to be an R or Python programmer
  • Productivity barriers: Data Scientists spend much of their time hunting for data and preparing a data set for analysis
  • Performance barriers: running analytical models against Hadoop scale data using traditional methods takes too long. Companies need analytical engines that scale along with the data

But before we start selecting technology solutions for each specific problem, we need to get a better idea of what kind of data marketplace we are trying to develop. Here are four marketplace options as examples and you can probably think of others that might fit your company's culture and requirements:

  • Gate Keeper: this marketplace is under the tight control of IT who controls access and registration to data sources. Gatekeepers maintain high standards but at a high cost
  • Shop Keeper: users only have one choice for each information object and the choice offered is usually a high-quality choice, blessed by the business and IT. Integration and data quality are high, though you only have the choice offered and getting that changed can be a chore
  • Outlet Mall: in this option, you have more choice but less integration and standardization. There is some oversight on approved choices and data quality/standards but the user is responsible for the consequences when she picks an "off-brand" data source. It can also sprout a cottage industry when data stewards from the variety of data sources across your organization set up shop to attract business
  • Wild West: you may not consider this a choice for your company as there is little oversight, and little to no governance, but at least all the data is under one roof. It can be the place to start modifying behaviors, improving content and services through social pressure (users don't go back to sources that don't provide good data)

After you chose which kind of marketplace you want to promote, there are technologies and methods that can help you. The application of data catalogs, glossaries and meta data management tools will:

  • Improve efficiency of data discovery: support self-service business intelligence interests by directing engineers, analysts, operators to the best source of data for their projects. With the wide diversity of data sources (could be hundreds including informal sources with strong resistance to changing local business unit preferences), they will be grateful for the increased productivity, reliability and direct path to the best source of data
  • Accelerate the acceptance of standard meta data: increase deployment of existing standards and reinforce stewardship of information objects in metadata for data source discovery
  • Accelerate the acceptance of systems of record: use data catalogs to influence adoption of best available "systems of record" where data quality and data management best efforts are focused. Identify the important shadow IT data sources, expose them to projects and see if they can be moved to the existing official systems of record
  • One stop shopping for data: establish a single, project, go-to "marketplace" for data discovery to overcome data governance that is loosely defined and often informal

Just like at Amazon, a data marketplace is supported by an integrated platform behind the scenes. The platform should consist of best-in-class capabilities for data discovery, data sampling, data profiling, data wrangling, data blending, data lineage, data cataloging, data preparation/transformation, analytical modelling, guided analytical modeling, model management and visualization augmenting the data platform to deliver end-to-end self service capabilities to the data analyst and scientist community.

This is not an easy journey to embark on, but you can start with simple steps by focusing on targeted communities (such as the data science team), and targeted data environments (understanding what is in your data lake), and select data sources (ones your business is already devoting time to get in better shape) and a subset of all the technology components in the full data platform.

"A journey of a thousand miles, begins with a single step" according to a Chinese proverb. What is keeping you from taking the first step?

Jim Crompton is a thought leader for Noah Consulting, an Infosys Company, helping pioneer the relationships between complex Upstream processes and enterprises with automation to create competitive advantage.  His experience over numerous decades combined with the development capability of Infosys is working to ensure successful alignment of man and machine.