The Infosys Labs research blog tracks trends in technology with a focus on applied research in Information and Communication Technology (ICT)

« July 2010 | Main | September 2010 »

August 26, 2010

Business Analytics - Gut Feeling to Competitive Advantage

Every organization has some decision capability and engine running internally that works for their specific context. One may call that engine as "Gut Feeling or the individual heroics", whereas few organizations try balancing "Gut Feeling" with some fact based inputs. However, the smarter organizations which stand apart and outpace their competitors eliminate the "Gut Feeling" culture and entirely transform themselves into Analytical organizations where every decision is backed by real facts and numbers, and every strategic decision gets challanged with an optmizing solution.

Such organizations have over a period of time evolved thru those stages of "Gut Feeling" to "Competitive Advantage" class of organizations, and Analytics has been the backbone for their success. Banking leverages Analytics to detect frauds, portfolio management, campaign management and risk management. Retail leverages Analytics to do cross-sell, optimized inventory in stores and warehouses, customer segmentation based discounting, catalogues management of their products, optmized quote-to-cash & order-to-cash cycles & effective vendor management. Manufacturing leverages Analytics for demand forecasting thereby reducing inventories, and shorter cash-to-cash cycles.

How should one approach towards being an Analytical Organzation?
The answer lies in the awareness/understanding of what the the biggest challanges for their organization exist today, e.g. customer attrition, profitability & revenue decline, competitors making inroads into business etc. The 3 Step approach to adopt Analytics into the organization is outlined below:

Step 1 - With the current challanges understanding, building a business case by providing numbers and the impact each of those challanges are making to the organization's overall profitability and growth. The simplest of example could be "We are losing our existing customer base by 5% every year, and out of those 5% there are over 40% of the top customers which contribute to 75% of the revenues for organization." Reasons can be multi-fold and each one if enumerated along with facts will help senior executive management accept and approve the business case.

Step 2 - Start small with localized approach and build analytical solutions for either a department or business process which in turn causes a major impact on the revenue stream of the organization. Enlist the issues which this business is not able to resolve, and provide right facts for taking effective decisions - tie those details with the objectives for this business process. There could be 2 outcomes of this exercise
a) Help figure out gaps in the facts being pushed in and out of the process touch points, can be data quality issues but process is still robust.
b) It's time to re-visit and refine the business process to the changing needs of organization and the economic changes demanding process change.

In either of the two outcomes the established fact is that it's hampering the visibility and effective decision making capability.

Step 3 - Once with Step 2 we are convinced that the analytical solution embedding in the business process works in a consistent manner, the same model has to be spread across to other departments or business processes. One important fact here would be to build priorities and plan accordingly, target bigger impact processes/deparments which have greater influence in the overall profitability, customer retention, revenue etc. This will help build a consistent and proven business analytical model across the organization.

At a more granular level for those 3 steps one would definitely need to look at following as well:

  • Enhance and build the skills/capability within organization both at Technical IT levels and Business Analysts levels
  • Data Integration strategies to provide clean, consistent and integrated data
  • Get the right tools and technology in place by doing appropriate vendor analysis and comparisions - those that fit the organization needs, and yet can scale for future needs
  • Relook at business processes/strategies is key, as competition and market situations will constantly keep changing
  • Setup enterprise level data quality, data management (governance being the key) principles to avoid any surprises

Thus, the maturity from a "Gut feel" organization to "Leverage Analytics for Competitive Advantage" organization is a gradual process and should be started at any point in organization, certainly well before your competitors do. We have grown past the age of accepting Analytics as key differentiator in the market today between Failed/Failing v/s Smarter organizations - its time to implement and reap the benefits.

August 17, 2010

Database Scaling Methods for SaaS based Multi-tenant Applications

Scalability is one of key requirements of SaaS based Applications as it has to support users and data belonging to multiple tenants. Also it should be scalable in addressing future requirements once SaaS Provider provisions more tenants in the future.

 

SaaS Providers are inclined towards adopting shared database and shared schema strategy to support multiple tenants due to cost effectiveness involved in leveraging this strategy. Adopting this approach however brings one major challenge pertaining to database scaling as database is shared among all the tenants supported by SaaS Application.

 

SaaS Applications adopting shared database, shared schema approach should be designed considering that it will need to be scaled when it can no longer meet baseline performance metrics in the future, as when too many users will try to access the database concurrently or the size of the database will be causing queries and updates to take too long to execute. One of the ways to scale out a shared database is database sharding.  It is the most effective way to scale out database as rows in shared schema are differentiated for the tenants based on tenant ID and database can be easily partitioned horizontally based on tenant ID. This makes it easy to move data belonging to each of the tenant to individual partition. Database Sharding provides many advantages such as faster reads and writes to the database, improved search response, smaller table sizes and distribution of tables based on the need. 

 

But while partitioning data of multi-tenant SaaS Application, we need to consider factors like performance degradation due to increased number of concurrent users or increased database size due to provisioning of multiple tenants which may impact performance characteristics of existing database.  This will help to select appropriate technique to partition based on database size requirement of individual tenant or number of users of individual tenant accessing database concurrently.

 

August 13, 2010

Blunders in Performance life cycle within SDLC

Performance is embedded into various stages of SDLC typically subject to the perception of the architects to developers to deployment teams. Though "in principle" everybody is aware of the importance of performance - the actual implementation of the "ideal practices" depends on the availability of the time and expertise. Often it swings from strict implementations without weighting the criticality to sheer ignorance to postponement owing to constraints at various stages. Below are some points to take into consideration while doing the "trade-off" during various phases of development to deployment cycle. While most of these points would appear to be "obvious" - they are invariably the most "slipped out" ones !

(i)            Over-Planning in design phase

Extensive preparations and excessive attention put into performance aspects in the planning and design phases often turns out to be not as productive. During this phase one needs to balance the amount of effort spent and their proportional impact on performance - very likely large time spent has an infinitesimal effect of actual performance since things are yet "on paper".

Over-planning typically leads to over-confidence : when you are sure that you have not left any hole in your design phase, it becomes hard to figure out where to start when a performance issue occurs. Projects are managed with a waterfall approach - thorough designs integrated with modeling tools lead to the consideration of being bullet-proof. This perception then persists across architects and project managers to the developers and QA teams. It then turns into a case of missing the forest while focusing on the trees.

(ii)          Under-Planning to enable Performance trouble shooting in development stage

It is imperative to use a logging API in order to help locate performance degradations. Ensure the places of the logging points to be in line with the application flow and that it leads to the right trails rather than just a scatter and clutter of the code execution. Logging APIs invariably include a check for logging levels. While this lets developers have the freedom to include logging statements generously during development - the purpose and the positions should not be restricted to "functional" aspects only which typically remains the case. Every call to an external and/or downstream system must have an appropriate log statement. Any internal algorithms (single or a bunch of methods) that is likely to take longer than a few milliseconds should log at the beginning, the end and during any significant calls made during its execution. Most logging APIs have a configuration where the log entries include the class and a time stamp - thus its not required to create timers to quantify the length of a call.

All exceptions must be logged, and be logged irrespective of the logging levels. This imposes a "restriction" on coding where in exceptions should be used only for exceptions! Exception should not be used as a return value when it can be anticipated to occur - in other words a "catch block" should never have business logic. In such cases unnecessarily lot of time needs to be spent for tracking down performance issues.

(iii)         Passing the ball across when performance issues crop up post deployment

It's human to believe that everything one does is so well done that the problem must be somewhere else ! Make sure that when problems surface the investigation starts with evaluation of the area one is associated with. Simply passing ball from one court to the other does not lead to the solution especially when multiple sub-teams / streams are involved. Projects typically have a team working on the web application and other teams developing service-layer APIs etc. Typically whichever team discovers a performance issue will invariably contact the other team and demand they fix the problem. What helps is rather specifying an estimation of the cause by going through your area and making sure it's "elsewhere". If it is a simple fix, it takes far less time to fix it than to pass it off to someone else to fix. While in case of complex issues - working cohesively leads to a productive solution.

August 9, 2010

Test platform in cloud

                        Performance testing is done to verify that a system meets the performance criteria; it aids in diagnosing what parts of the system or workload cause the system to perform badly. Traditional testing environment requires a lot of effort and time during set up .This is because it includes procuring the right infrastructure, installing the required software, etc. Cloud computing is one of the most talked about technology in software industry today. Cloud computing allows consumers to use applications without installation and access their files at any computer with internet access. The major goal of cloud computing is to provide easy, scalable access to computing resources and IT services. Cloud based solutions offer new and different possibilities to testing. A cloud-based, test platform can address the challenges a traditional test environment pose, through automated scaling up or down of testing infrastructure.

 

            We have set up a test platform in cloud which made use of open source technologies like, Eucalyptus, Apache JMeter, Apache Tomcat and MySQL. rpc.rstatd was used to obtain the performance counters of these instances. Following are the main steps which we have done in this performance testing exercise:

·        Creation of  three CentOS images : Tomcat, JMeter and MySQL.

Pre-bundled CentOS image is a major requirement for this exercise. Using this as the base image, we have created the above images.

·         Tomcat and MySQL instances start up automatically during instance boot-up.

Shell script is written to start both tomcat and MySQL instances automatically during instance boot-up .The script is written in rc.local file and placed in the etc mount folder of the image.

·         Shell script to add JMeter slave machine's IP to master machine's jmeter.properties       file.

A CentOS 5.2 machine was chosen as the JMeter master machine. Shell script was written, to add the IP of JMeter instance to the master machine. This would automatically add the slave machine's IP to the jmeter.properties file of the master machine during instance boot-up.

·         Tests were conducted in these instances as well as physical machines.

Performance test of the 'jpetstore' web application was conducted in both instance and machines. JMeter instances were used as load generators. The utilization details were captured using rstatd. We also monitored response time and throughput of the application in physical machines and instances.

                        Ensure that both instances and physical machines in which tests are performed are of same configuration. We have observed that the effort and time taken during the set up of test environment has reduced. Since shell scripts were used for automation, configuration settings to be done was also less compared to the settings done in traditional test environment. We have noticed that performance on the basis of response time and utilization was better in physical machines compared to instances .