The Infosys Labs research blog tracks trends in technology with a focus on applied research in Information and Communication Technology (ICT)

« January 2010 | Main | March 2010 »

February 8, 2010

BI on ECM - Who says not possible?

BI aid business in the decision making by providing different set of analysis on the business data. The assumption till now was that the BI is only possible in the databases as you need structured data to do the analysis but the scenario has rapidly changed in last few years. Companies are using BI on top of ECM to perform different kind of analytics.

This Blog is from my Colleague - Sumit Sahota ( 

Databases are normally used by the applications which are built for some specific purposes/requirement like CRM, ERP, and Payroll etc but there is an ocean of data in the documents we create, mails we send, images we use, web pages we create etc. Total of 80% of the data resides in this unstructured format and this data requires ECM tool for a proper storage or management… BUT… how does this storage happens and how is this content retrieved? Are ECM systems providing some structure to unstructured content?

If we see, the content, which can be a document, presentation or image file, is either stored in a DB as a binary object or stored in a file system by the ECM tool. The files in the ECM will not be stored just as one more object  somewhere but it will be stored in some sort of structure which can be either created by the user in form of Taxonomy or can be created by the system with the tags provided which is known as folksonomy . This is first step to “structure” the “unstructured” data. There are content classifications tools to aid this process.

Secondly, the detail tags which are provided along with the content, is the one more way to add structure as the content will be stored anywhere but the metatags will be stored in DB. This metadata can be made rich using the automated metadata extraction tools. These tools scan through the document and provide user a list of tags which can be used. These metatags are used by the search engines for indexing and makes content retrieval faster.

Third way to provide structure is to keep track of the user activities. All the ECM tools have the capabilities to generate the audit trail reports. These reports are generated based in the log which is kept for each file. These logs can be as advanced as possible. It can list all the activities along with the time user is spending in a page, document etc. These kinds of logs are used by the web analytic tools also.

Forth way to add structure in this social networking and collaborative world is by adding the user feedback in terms of rating, tagging, labels, score, opinion etc. These can then be used for variety of applications for different kind of analysis.

If we see the unstructured content is not fully unstructured and there are lots of ways a BI tool can work on this. Text analytics is a simple example, Web analytics is another one. Voice of the Customer analytics is gaining lot of popularity. There are lots of reports which can be generated and this kind analytics can and will aid in decision making.

February 4, 2010

ISO 23026 -2006

According to an estimate by the WorldWideWebSize there are over 50 billion web pages on the WWW today. Unless these web pages (and in effect websites) are properly engineered, managed and maintained over their life cycle, there are bound to be several frustrated web users out there who are unable to accomplish their goals and objectives when visiting a website.
Ever since the WWW started growing in leaps and bounds it was noticed that websites were being developed with very little consideration for the implications of website design or implementation realities. While there were sites that used state-of-the-art technologies that could only be used by most advanced devices as well as users, there were also websites that were not updated and were languishing in contemporary technologies and usage patterns. Both these result in poor productivity and user frustrations.

In addition, the exact life span of a website was difficult to estimate and, therefore, could ideally be considered to far outlive both the organizations for whom they were made as well as the vendors who made them. Particularly, websites that dealt with long standing institutions (like the UN) and those of public sector entities (like government department portals) could last for decades. This inability to estimate the life span of websites led to problems in tools and products used by developers; execution languages; and formats and presentations used for websites.

These were problems that needed to be nipped in their bud to prevent wide scale website failures. With this intention the Internet Best Practices working group of the IEEE started accumulating the best practices in website management in early 1990s. Their focus, then, was on site-wide issues of managed websites. These practices were expected to reduce the risks associated with investments in website development. The working group emphasized that "the value of web-based operations was delivering the right information and services to the right persons at the right time with the least amount of effort". Therefore, an understanding of the target-user community and their information needs was considered as the basic building block for web design and engineering as against the established notion of knowledge of technological advancements. By 1999, the working group had already selected a set of "recommended practices" which was formally transformed into an IEEE standard - IEEE 2001-1999

These practices were then extended to the websites on the WWW as well as corporate intranets  and extranets of collaborating organizations. This resulted in the creation of IEEE Std 2001-2002 titled "IEEE Recommended Practice for the Internet—Web Site Engineering, Web Site Management, and Web Site Life Cycle". This standard was intended to "improve the effectiveness of Web pages for users, Web page developers, and the value of the Web in corporate and organizational applications."

The International Organization for Standards (ISO) and the International Electrotechnical Commission (IEC) adopted this IEEE standard as the ISO 23026-2006 titled "Software Engineering — Recommended Practice for the Internet — Web Site Engineering, Web Site Management, and Web Site Life Cycle

Some more details of this standard in a subsequent post.

February 2, 2010

BI Open Source Story - Are we there yet

Recession reminded us of the Darwin's theory of 'Survival of the Fittest', and that's what we are seeing - the companies which have focused on optimizing costs, efficiency and productivity have had their flags high today. Those are the companies which did a balancing act of managing costs and yet retain their best talents in people. Open Source is one horizon those companies are starting to venture out.

Exactly a year or two back there were big IT spending and budgets allocated towards building enterprise wide solutions, with BI and Data Integration at the core front. The prime reasons for such big budgets were not only the high licensing costs of available product solutions but the availability of SME's to implement those solutions. Pick up any of the Gartner, Forrester or other surveys of the world for past 4-5 years of organization priorities and top priority list will always have BI, Analytics, Data Integration and Reporting as most in demand. Today, the focus & the need of the hour is leading towards open source, SaaS/PaaS, Cloud - and mind you none of the key requirements are being compromised (infact it’s even more demanding):

1. Low cost of ownership

2. Increased levels of higher Performance

3. Reliability

4. Flexible and Adaptable (integration point of view)

5. Larger support base - compared to the vendor provided technical support model


Few common Myths about Open Source:

1. Open Source has scalability problems, and can't support larger enterprise wide applications - concerns around the poor quality of testing, performance, security capabilities. On the contrary Open Source go thru equally if not more enhanced, robust and rigorous testing as the larger community (early adopters) provide far greater inputs than traditional vendor products. The problems typically get fixed early in the development cycle.

2. Open Source vendors don't have full support and ownership - Another myth as copyright laws apply equally to Open Source vendors as other traditional product vendors. Only difference is the benefit from open source vendors to share their IP's with larger pool of audiences. Open source does come with various support models, and with few of those vendors you can choose to go for Enterprise level support as well. Infact as a customer you get to choose to buy support & services.

3. It's yet to mature in IT world - 5 years back this would have been an absolute Myth, however, today with various small, medium and large sized organizations adopting and utilizing the benefits of open source this can be only considered as a mis-conception.

4. Only fine till the developer community and target audience is developer - with end-to-end Enterprise suites of applications in market providing BI capabilities, and majority of that being consumed by Business users leveraging the pervasive nature of BI to the fullest, this statement no longer stands true in itself. Yes its still quite popular and challenging in developer community, however organizations are considering it on serious notes of their business decisions.

5. Security concerns - with Open Source code available to all there's a risk of security threat, and anyone can break its security. If one understands and believes that Open Source are built using standards, principles and methodologies as any other software, this myth doesn't stand a chance.


So what are the areas where Open Source can help you out in the BI World?

1. Data Integration - No matter how many DI solutions are available in the market claiming complete automation, there is still a major chunk of manual coding in form of PL/SQL, Scripting, variety of tools to do your ETL. All this added to the huge integration costs for IT spending, and running cost for maintenance and support. Open source comes in handy in DI providing both cost advantages as well as enhancing productivity via the reduced automation cycles for integration.

2. Reporting - With a wide variety of options to do reporting including Dashboards, Scorecards, Static/Dynamic reporting, Real-time analysis/Analytics, and the pervasive nature of reporting today, the need for deciding on right tools within the budgets is a challenge. Open source tools are bringing in open standards that allow users to pick and plug what suits their needs without limiting to one specific vendor. What they get for free is the specialized capabilities of each tool in their space, and allows businesses to take better informed decisions.

3. Data Visualizations - Profiling on data & using advanced visualizations early in stages of integration, saves a huge amount of effort and thereby costs in later stages, providing cleaner source of data to take decisions on & figure out the hidden data inconsistencies. Open source tools with data visualizations not only provide traditional graphs and charts capabilities, but go well beyond providing advanced visualizations on data like statistical measures, probabilistic measures, patterns/clusters of data to seek problems, data duplications, plotting data on various 3-Dimensional graphs.


Key Reasons to use open source:

- Lower the costs in IT Investments

- Flexibility (Extensibility & Customization), clubbed with incorporating the latest research trends

- Minimal Vendor dependency, due to open standards for integration & collaboration models

- Larger pool of technical community, helping in quicker resolutions to the technical glitches

- Data Integration based on open standards

- Use what you need (Pay for what you need basis), and pick the best features suiting your needs

- Turnaround time on enhancing features and capabilities

- Use before you buy

- Get the best of the capabilities on Data Integration, BI and Reporting space

- Provides you a strong arm to aid your research, and put it to best of use in action


A word of caution and approach while choosing to get on the Open Source Highway - It's not too old the concept of Open Source in BI world, and definitely room for getting more mature. This will be evident with the scale and level of implementations where Open Source is doing well in future, both from performance and scalability point of view. One needs to assess the tools capabilities with their needs, and get references from service providers or consulting firms on their experiences.


I feel any organization considering to choose any BI tool or set of tools, or for that matter any tool as organizational standard the following approach will help in a longer run:


Step 1: Any organization should first list of their expectations, needs from Data Integration, Reporting and BI, irrespective of what industry tools today provide. This should list all functional, non-functional, technical, architectural and business needs.

Step 2: With those list of expectations, do a thorough vendor/tool assessment using a methodological process of eliminating and short listing required features. Rank the vendor/tools using one of the mathematical models (e.g. weighted average) on each feature or capability you need, based on the response from Vendors on your list.

Step 3: Finally the analysis on the comparative study on independent and un-biased assessment will help you figure out the tool which best suits your need.


It will be worthwhile to compare licensed commercial vendors with open source, and see which one figures out better in feature v/s the price tag or cost benefit analysis.

In Summary - BI is not only important for decision capabilities but in today's economy it’s vital to the survival against the competition. The challenging environment of reduced IT spending, cost controls, reduced tasks force and pressure to remain competitive the organization's IT groups have already started their journey on Open Source technologies. Open Source in the Information Management space is becoming quite mature in areas of Data Integration, Data Visualizations, Business Intelligence & Databases and now help several enterprises (small to medium to large sized) focus on delivering business critical information to their decision makers at lower costs, and with greater flexibility. For that level of features and capabilities, clubbed with the cost advantage – one can’t afford to ignore the Open Source story all together. Don't forget to share your experiences with Open Source.