The Big C in the Big D
Guest Post by
Timothy Sutherland, Senior Technology Architect, MFGADT Online, Infosys
There seems to be a lot of buzz surrounding the Big
Data 'phenomenon' at present. While I suspect much of this is marketing hype and
nearing the 'peak of inflated expectation' (driven perhaps by the same marketing
folks that played a leading role in the dot.com boom crash at the turn of the millennia),
clearly there is a genuine requirement to harness (or more specifically, exploit)
the unfathomable volume of data that is being generated.
For me, Big Data is probably the convergence of mobile, cloud and social computing (Web 2.0), or perhaps more generally, an evolution of pervasive computing (but by no means a revolution). It was probably inevitable that this would become a hot topic, but that's easy said in hindsight.
While there is no clear-cut definition of Big Data, it
is generally defined as data that is difficult to managed or analyse using
traditional data management tools, methods and infrastructures. Clear as mud.
By definition then, surely the challenges that apply
to managing this data could equally apply to content? Notwithstanding, data ('structured')
has inherent semantic meaning whereas content ('unstructured') has intrinsic
value and relevance in the context of the underlying the process that generated
it, is consumed by it or interacts with it. The dimensions of volume, variety,
velocity and veracity equally apply to unstructured content. Big Data is fuzzy
and so is content.
Moreover, when we consider that content is becoming
increasingly difficult to manage, even those organizations that do have the
requisite maturity, capability and intent are struggling. It's spiralling out
of control. Consider, generally 80% of information in an organisation is
unstructured and the overwhelming majority is either not managed or poorly
managed. Big Data in this sense is a much broader set of information across an
enterprise and includes Big Content (Big C). Content remains King (and I am not
just referring to web content).
Indeed, according to a recent survey by AIIM (see http://www.publictechnology.net/sector/central-gov/study-getting-big-data-out-digital-landfill), the majority of organisations feel that
unstructured data is not exploited to the degree which structured data is. Many
respondents to the survey were either unsure or felt that unstructured data was
poorly used for analysis and decision making. How true.
Often, where the Big C is considered, and it is, it
inevitably involves references to the huge datasets generated by the likes of
Facebook, Twitter and Google. But of course Big Content is much more than that
isn't it? So where does the Big C (Big Content) fit in the scheme of things?
Content management is more aligned to 'systems of
engagement' rather than systems of records (see http://www.globallogic.com/geoffreymoore.html). Although in practice the distinction is
perhaps not so clear-cut. Generally though, when we talk about Big C in the
context of Big D, we are interested in both data AND content that is transacted
during those interactions with a customer, employee or department.
How do we leverage this information? How also do we
provide a unified and meaningful view between unstructured and structured data?
How can we build bridges between the structured and unstructured repositories?
Where can we exploit advances in search, semantics, categorisation and content
analytics to leverage content and data?
While the leading content management vendors have
extended their portfolio to include offerings to manage the lifecycle of a
diverse range of new content types (such as blogs, wikis, tweets, digital
assets, cases/contracts and other 'transactional' data) what appears missing is
the meaningful analysis of this content.
Today it is possible to capture, manage, store,
deliver and preserve content, but how does one leverage this content? How can
content be used to improve analysis, decision making, performance and customer
experience? How do we aggregate, consolidate and interpret the gargantuan
volumes of information that is being generated? How do we sort the content that
is of value to the organisation from content that is destined for disposal or
the digital wasteland? Specifically, what role does content play in decision
making and strategy? Or is it merely a by-product of the decision making
process?
I believe the opportunities for Big Content are to
extend the boundaries of content management beyond the document centric enclaves
or silos we are accustomed to working with. That's because for a long time
content management systems have been viewed as just that - a tool for managing
the lifecycle (cradle to grave) of content, and specifically, documents or web
pages.
What's needed is a fundamental change, a paradigm
shift if you will, that can take the silos of information across an enterprise
and combine this in novel or interesting ways. And how we think about content
management systems must change. Content is an asset that must be leveraged and
content management systems are platforms or enablers rather than packaged
solutions.
The requirements is not just storage and management of
Big C, but the analysis, re-use, standardisation (of content as much as process),
interoperability and integration (structured and unstructured).
The Big C has an important place in Big D and the Big C should be given the attention it rightfully deserves. How the markets will address the Big C challenge is something we should all watch with some interest. Certainly new challenges will arise but new opportunities will quickly follow.
Guest Post by
Timothy Sutherland, Senior Technology Architect, MFGADT
Online, Infosys



Comments
Agree with you Timothy.. This is where big giants like IBM and Oracle are creating Big Data platforms and even existing commercial tools are gearing up to support Big Data through Technologies like Hadoop and Map Reduce.. Search engines are also candidate for doing the analysis on structured and unstructured content at one shot and provide a 360 view on the data..
Posted by: Ketan C | September 11, 2012 10:50 AM
Excellent read Tim!
Posted by: Varun Chhibber | September 22, 2012 9:51 AM