The Infosys Labs research blog tracks trends in technology with a focus on applied research in Information and Communication Technology (ICT)

« September 2010 | Main | November 2010 »

October 27, 2010

BI Top Trend Analysis - Part 1

Close to the year end for 2010 and time to re-look, analyze the trends in Business Intelligence space. This is a series of blogs to try & analyze the trends which are more influenced by changes in world outside the organization, and see how the solutions are shaping out the future of Information Management.

According to the HP top 10 trends for 20101 in Business Intelligence space were predicted as follows:
Trend 1: Increased data and business intelligence program governance
Trend 2: Enterprise-wide data integration: A good investment
Trend 3: The promise of semantic technologies
Trend 4: Expanding use of advanced analytics
Trend 5: Narrowing the gap between operational systems and the data warehouse
Trend 6: Data warehousing and business intelligence: A new generation drives new priorities
Trend 7: Growing impact and opportunity of Complex Event Processing
Trend 8: Growing importance of integrating and analyzing unstructured/semi-structured data
Trend 9: Social computing and the next frontier for business intelligence
Trend 10: Growing interest in cloud computing for business intelligence

If we try classifying the trends as changes within organization and changes outside the organization, the Trends Category A:[1,2,4,5,6] will fall under changes within the organization boundaries at a high level. Whereas the Trends Category B:[3,7,8,9,10] are impacts due to external world changing and the behavior of customers, channels and options available to access the information is changed a lot. Althought the trend 10 in Category B is the growing need to optmize the operational cost of ownership of Infrastructure, and explore options to move not so much relevant data outside the organizational boundaries in addition to the demand for space due to spurt in the volume of information generation need.

Let me begin with one of the eye catchers and hot topic that's on everyone's lips - Trend#9 Social Computing.

We are all aware of B2C, and B2B scenarios & even the C2B in some cases. These models have been prevalent in industry for long time, now there's a "New Kid on the Block" - the C2C. GotchYa - never thought about something like this !!!. Isn't it surprising no one ever thought on the power of this model considering that we are social living beings who tend to interact with like minded beings, and take time to adopt not-so-like minded beings. We share, we talk, we discuss, we regret, we cribb, we complaint, we appreciate and recommend and comment on things that we do, we buy, we sell/reject or work we get invovled with. To illustrate the point with few examples of how without any company/organization involvement the marketing/branding is done either rewarded or not:
1. Word of mouth by like minded communities discussing about brands : Social Marketing
2. Referrals for job postings within organization : Social Sales

This trend takes the enhanced customer experience to next level of Business Intelligence by capturing the social interactions, and networking the customers are having on various channels like Twitter, Facebook, Linkedin etc. Those channels have bigger influence today on customer decision process of buying a product or brand than any direct marketing campaigns of those product/brand companies.

Natural question then to ask is the interactions on those channels are primarily freeflow text or jargoned languages or simply un-structured data, how can one make meaningful data out of this to define behaviors, attitudes or co-relations. Thanks to the great world of complex data mining, statistical analysis and text analytics which offers various options to build/classify those conversations into taxonomies/categories and provide scores on the performance of each.

Next blog will continue on Social Computing and Social CRM Analytics - keep watching for more...

References:
1 HP Top 10 Trends in Business Intelligence for 2010 - http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA0-6420ENW.pdf

October 21, 2010

Performance testing 101 - Parameterization of values in load testing

Load testing simulates a real time user load on the application and testing this prior to production helps assess an application's scalability and responsiveness. It helps uncover any performance issues with the application which can then be addressed in before it moves into production. HP LoadRunner, IBM Rational Performance Tester, JMeter, WebLoad and Microsoft VSTS are some of the most common load testing tools. These tools simulate the load according to the number of concurrent users and each user can be defined with credentials as specified by the client. This process of assigning each load with user-defined values is called parameterization of values.

In load testing, we record a script and run the same script for different sets of concurrent users. When we record a script for the first time, the values as entered while recording get captured as default values. The issue here is the recorded script has only one set of values which are the default values and they remain the same during the entire schedule for the test. As such user-defined values cannot be entered and the test is not the actual simulation of the production environment. Parameterization technique helps overcome this situation in load testing.  With parameterization, we create a file which has user-defined values, import the file into the script, and replace the default values at the respective places in the script.

Let's take an example to illustrate the same: 

Consider a simple application with a login page which displays the account summary of the user. We assume we have to load test this application for 300 users. Each of these 300 users has different usernames and passwords.

Following steps are followed:

• Record the script
While recording the script for the load testing we used WebLoad 5 and the user entered the following credentials : Username - user1 and password - user1
Here is the part of the script which captures these user entered values as default values -

wlHttp.ContentType = "application/x-www-form-urlencoded"
wlHttp.FormData["j_username"] = 'user1'
wlHttp.FormData["j_password"] = 'user1'
wlHttp.FormData["Submit"] = "Login"
wlHttp.Post(wlHttp.Url)

• Create a text file called 'login_300users.txt' with the user-defined input values in a comma-separated pattern. Like -
user1,user1
user2,user2
user3,user3
..........
user300,user300

• Include the following code at the beginning of the script to add the file with user-defined values.
function InitAgenda() {
IncludeFile("AsmLib.js",WLExecuteScript)
CopyFile("login_300users.txt")
}

• Initialize an array to represent the elements which are comma separated.
DataArr_Login=GetLine("login_300users.txt")

• Replace the values with the variables.
wlHttp.ContentType = "application/x-www-form-urlencoded"
wlHttp.FormData["j_username"] = DataArr_Login[1]
wlHttp.FormData["j_password"] = DataArr_Login[2]
wlHttp.FormData["Submit"] = "Login"
wlHttp.Post(wlHttp.Url)

Thus, the script is modified for load testing and the values are parameterized as per the user-defined values.

October 6, 2010

Performance Extrapolation that Uses Industry Benchmarks with Performance Models

A white paper written by me and my colleague Kiran C Nair was presented in SPECTS'10 conference (http://atc.udg.edu/SPECTS2010/program.php). This conference targeted professionals involved in performance evaluation of computer and telecommunication systems. The paper described an approach to predict performance of applications using Industry standard benchmarks and Queuing Network Models. Multiple requests for extrapolating applications performance to different hardware was the key motivation for coming up with this paper.

The industry benchmarks like SPEC and TPC provide a standard way to compare system performances and also act as pointers for capacity planning. Some analytical methods use these benchmarks for linearly projecting the system utilizations and throughputs. However when it comes to prediction of performance metrics (Utilization, Response Time, Throughput etc) for a multi-layered application like an OLTP application, distributed across multiple resources - - this approach of extrapolation does not provide holistic results.

Then there are performance modeling techniques like QNM, QPN, LQN, etc. that provide detailed understanding of application performance for varying scenarios to enterprise architects. These models are created using the measurements from existing system, and any change in hardware would change these measured values and the performance model itself.

So industry benchmarks were useful to compare hardware performances, but they could not provide much insight on performance metrics for distributed applications. For what-if analysis of such multi-tiered applications, in horizontal scaling and varying workload scenarios, performance models were used. But independently they could not predict vertical scaling impact. Thus to get the overall performance prediction for scaling of hardware and other changes, a hybrid of these two approaches was experimented. The approach and the findings are detailed in this paper.