« What the Infosys-ATP Partnership Taught Me About Business | Main | Bet You Won't Have Time To Read This Post Today »

November 24, 2016

Have You Taken Robust Measures to Verify Data?

Posted by Srinivasa Gopal Sugavanam (View Profile | View All Posts) at 11:31 AM

What Makes a Human Click! [Source: https://www.youtube.com/watch?v=-Jy3IdLaZeA]

Have you ever wondered if the lion's share of data your organization collects is accurate and actionable? It's a question more and more executives are asking themselves as 'trend articles' on respected social media sites have turned out to be outright fabrications. With no supervision of human editors, the algorithms in charge of selecting news for the 'trending articles' have a field day.

I'm specifically referring to a recent investigation by Washington Post into Facebook's Trending news section. They discovered that over a three-week period beginning in late August, five news articles on Facebook were, 'indisputably fake.' Another three articles on Facebook's Trending news section were 'profoundly inaccurate'. My favorite fake yet trending article uncovered by the Post investigation? Apple CEO Tim Cook's announcement that consumers should prepare themselves for the release of the iPhone 8, which will include a feature that allows users to conjure up a physical Siri who comes out of the phone and helps with chores.

What this incident points out is that although machine learning is becoming sophisticated by the day, it still hasn't reached the point where algorithms can be completely free of human guidance and trusted to analyze and curate Big Data impeccably. To give you another example of the intelligent, analytical role that humans continue to play in data analysis, you can call to mind the rather common children's rhyme: "In fourteen hundred and ninety-two, Columbus sailed the ocean blue." In decades past, schoolchildren learned this rhyme as they studied the fundamentals of a sailor who thought he had crossed the ocean and found India. The problem with the rhyme is the data. It can't be trusted. What, for example, is fourteen hundred and ninety-two? It's a year, but only to a segment of the global population that follows the Gregorian calendar. And wasn't the name of the hero Cristoforo Colombo in his native Italy, and in Spain, from where he sailed, Cristobal Colón? And is the ocean blue? Not really. The colorless water simply reflects a blue sky.

My point here is that data that seems perfectly correct (and in this example, unquestioned for centuries) may not be all that it's made out to be, and only human intervention can point out these discrepancies. Now that enterprises operate in the world of big data, and much of this data is used to make decisions with extensive business impact, the question whether the data being accessed can be trusted becomes an important and expensive proposition.

My colleagues at Infosys created a video that innovatively addresses this very point. That is, if an enterprise relies solely on Artificial Intelligence algorithms, are they really getting the most out of their data? Can humans offer a facet to data analysis that only they are able to provide? You bet they can. Watch the video here. The first reaction I had when I watched the video is that without the right algorithm, a robot cannot detect the smudge on the face of a person. Then I realized the issue to be far more complex. Without a human element included in the data collection and analysis process, corporations might spend lots of resources trying to deal with erroneous pieces of information instead of enjoying machine learning's many efficiencies.

In an article in the Harvard Business Review, author and data expert Thomas Redman points out what should be obvious to large, data-focused enterprises (but is often not). He says, 'It doesn't matter how much data your organization collects but what matters is the accuracy of the answers it throws up, and how the data acts when it is combined with other data sets.' Indeed, flawed data can do a company more harm than the absence of a data collection and analysis program in the first place. This issue will only intensify as A.I. and Big Data become more sophisticated and ingrained in corporate culture.

Data should be your organization's most potent asset - not an expensive liability. Perhaps the time has come to evaluate what fail-safes and back-ups your enterprise has set up to maintain and protect data integrity. After all, sometimes even the smartest robot can't tell that a smudge is a smudge...

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Search InfyTalk

+1 and Like InfyTalk

Subscribe to InfyTalk feed

InfyTalk VBlogs: Watch Now

Infosys on Twitter