The Infosys Labs research blog tracks trends in technology with a focus on applied research in Information and Communication Technology (ICT)

« November 2010 | Main | January 2011 »

December 23, 2010

A Step towards Cloud Interoperability : Open Data Center Alliance

With the turn of technological revolution by Cloud Computing ,the race to become the leader in the same has given birth to some new problems.

The most immediate problem is lack of "Cloud Interoperability" .i.e the ability to use the same resources across variety of clouds. But every cloud service provider player has come up with their own unique way of inter - cloud application interaction or user interaction . This process of creating and using the isolated & different APIs has become a constraint for the user.As this has limited the choice of clouds , vendors and respective instruments .Thus the inter connectivity between the different clouds can become difficult. Moreover it has defeated the concepts of portability and integration. And will surely lead to the fragmentation of clouds.

To overcome this hegemonic hurdle the 70 major IT user groups (like Lockheed Martin, BMW, China Life, Deutsche Bank, JPMorgan Chase, Marriott, the National Australia Bank, Shell and UBS) whose total IT spending is more than $50 billion ,have joined hands under the name of "Open Data Center Alliance".

As per the official website ( http://www.opendatacenteralliance.org/) "The Open Data Center Alliance is an independent consortium comprised of leading global IT managers who have come together to resolve key IT challenges and fulfill cloud infrastructure needs into the future by creating an open, vendor-agnostic Usage Model Roadmap."

So the Open Data Center Alliance will bargain to ensure interoperability across core technologies related to network and cloud.And we can expect a bright and INTEGRATED future for the clouds.

December 21, 2010

Why a 'guesstimate' while identifying the workload mix may invalidate the complete performance testing findings

Modeling workload for a web application plays an important role in defining the success of the performance testing exercise. Though it does not directly impact the accuracy of the test results obtained, it does impact the accuracy of the tests carried out to verify the performance. To simply put, if the workload is not modeled accurately, the confidence level to ascertain the claims defined as the performance testing goals is negatively affected.

Most often it is believed that the main objective of assessing the workload for a system is to generate manageable workload so that the system is neither overloaded nor underloaded. However contrary to what might be thought the focus of accurately modeling the workload is more on loading the system 'appropriately', be it overloaded or underloaded, i.e. load the system in accordance with what is to be expected in production environment.

A typical performance testing exercises would undertake workload modeling as a task to identify the load to be generated on the servers. A graph is plotted, as shown below, which represents the load generated on the web server, an entry point to the application, against time.

Wload.pngWhat follows is a 'guesstimation' to come up with the load which has to be generated on the servers. Being a peak period, time t' is considered for the determining the workload and an appropriate figure for the load, wload, is calculated. This wload comprises of the total number of page hits the web server has attended to. Further the hits to the different URL's are then taken into consideration. Thus based on this URL analysis, the 'workload mix' is finally defined.

However what is often neglected is the accuracy of this workload mix, i.e. the distribution of various tasks to be sent to the servers. This distribution is of utmost importance as it finally governs how the different servers are going to be loaded.

To illustrate using an example, let's assume the 'wload' mentioned above can be broken into of 3 different sets of URLs which carry out 3 separate tasks, task1, task2, task3 (for example Search, Add to Cart, Checkout). Now it is fairly simple to understand that any task will generate a sequence of events at different servers in any application distributed across tiers. A task1 may be CPU intensive task at Application tier; a task2 may be memory heavy at Application tier or a task3 a Database intensive one.

So if during time t' when the workload was analyzed, if the distribution of the workload among these tasks is not determined accurately, the sole base of the performance testing exercise to be carried out will be deteriorated. So let's say instead of loading more of task1's, task3's are loaded more. This would lead to generating more load on Database tier instead of Application tier. This is for sure not a production like scenario, thus invalidating the complete sets of results generated out of these performance tests.

Hence it is of critical importance to accurately identify this distribution of workload among the mix of the different tasks, typically referred to as transactions. This calls for an analysis of the workload by looking at sets of URLs together, not just individual URLs.

December 20, 2010

Challenges during performance testing in a shared environment

Recently in one of our performance testing project, we were given a shared environment for conducting tests. During this exercise we faced multiple challenges while loading the system, measuring the outputs, successfully completing the tests and almost all the activities.

The following points highlight some of the challenges that we faced during performance testing of a web application in shared environment.
1. System performance metrics could not be measured in consistent and accurate manner due to varying external load on the system.
2. It was hard/ not possible to identify bottleneck conditions in most of the cases. Due to this, overall usefulness of the test results was coming down drastically.
E.g. highly varying system utilization was observed on DB server due to external load. Thus, it was not possible to identify the workload at which the threshold of 40% utilization of the server was being reached, due to application transactions.
3. In some cases as part of testing activities, server instances were required to be restarted. This was not possible at times as multiple applications would be deployed on the same server instance. Restarting would affect multiple activities in the environment.
4. Things are bound to go wrong while performance testing at higher user loads. There will be high probability of environment getting clogged/ failed. As the same environment was being used for other activities, performance tests were affecting these planned activities.
5. Validations of the test results were failing big time and re-execution of the tests were required quite often.
6. There might have been a major impact on effort and timelines. 

My subsequent blog will provide details on the method we formulated for overcoming these challenges, while performance testing in the shared environment. (To be continued...)

December 7, 2010

How long is 'healthy' response time - response time standards?

One of the key aspects of Performance is the response times of interactions with the system. Typically talked-worried-designed is the end user response time or the online user experience. Throughout the stages of SDLC this aspect is more debated and finally accepted-convinced what the test results demonstrate. What the business wants is 'faster' responses - in essence 'fast' implying to the satisfaction of end users. The quantification of 'fast' is typically 'too fast' during the requirements phase, so can there be any way of arriving at rational expectations, are there any response time standards ?

(i)            Standards are subjective

Note that there are no 'specified standards' when it comes to response times for user facing transactions. The traditional figures of 3 secs or maximum 8 secs are 'legacy' figures, moreover advances in technology can yield-facilitate responses at the level of milli secs. What matters most in determining the standards for a given system is the 'user perception' about the service being offered by the system. Rather than looking 'outside' for response time thresholds, its important to put yourself in the place of the end user and arrive at the numbers that suit the given enterprise-application.

Business - designers - architects need to take into account the factors that affect the user perception about the service being offered. Arrive at rational figures based on the variety of transactions-responses associated with the system (during planning & requirements phase). Cascade the end to end responses across the components within the system. Evaluate - Monitor - calibrate the responses throughout the coding to testing phases. Validate the 'final' response times to be mutually satisfactory.

(ii)          Response time bands

Based on the measurements of 'attention span' - following are some basic guidelines to take into consideration:

Sub second response: (less than 0.5 sec on an average): These latencies are typically not recorded as 'taking time' by end users -its like 'done immediately' from the perspective of human. Typically the case when users want to proceed quickly towards the next action - do not expect-want to wait for the outcome of the 'previous' action being performed. For example - users 'operating' fields-entities in the UI.

Response in seconds: (less than 2 sec on an average): These times make the users notice the delay - users expect that the system is 'working' on the inputs provided and has returned response fairly soon without having to unduly wait. Users do not feel that the system is sluggish, they do not lose the sense of 'smooth flow' in their journey of completing the task. For example - user operations that require providing credentials or navigation to 'next' step post current actions.

Extended Response: (more than 5 sec on an average): This is typically a limit for users to keep their attention on the current task. Anything slower than 8 secs must have some way of bringing out to the  end user "percent-done" indication and a clear way-facility to halt-interrupt the operation at whatever stage it has completed. Under these response times, users should not be expected to 'remain' on the same page-task, rather they 're-orient to the previous task' when they return to the response after doing some 'other task'. Any delays longer than 10 sec result in natural break in the user's current work-flow.

 

 (iii)         Choose what suits best for you

Based on the response time bands - attention spans as above and the service being offered, choose the SLAs that work best for you. The factors for selection need to be (i) Service-business being done : Banking, Telecom, Health, Call centre etc (ii) Distance of end user - direct user interaction versus users reaching the service through via-media; multiple users consuming same service, system inner complexity - need to get responses from 'outside' systems (iii) Factors affecting attention span - age groups of service consumers, emotions-temperament of users, professions-occupations of users, possible user attrition reasons, different time phased ways of accomplishing the same task  (iv) Real time criticalness of transactions under consideration (v) Current market trends and 'competitors' offering 'similar' services (vi) Balance between architecture - technology - business

Having irrational targets and failing to meet them or settling for slower targets leads to 'internal' dissatisfaction about your own system. Balancing the factors and arriving at prudent targets helps better for IT as well as business. Design the responses keeping in mind how the end user will utilize them - make right choices for suitable transactions keeping in mind that the users never accomplish 'all' the tasks in a single flow-span !

Subscribe to this blog's feed

Follow us on