Infrastructure Services are definitely undergoing a major transformation. How does one navigate the web of emerging technology trends and stay ahead of the game? Read on to learn more on our Infra Matters blog.

« June 2013 | Main | September 2013 »

July 30, 2013

Picking the right tools for IT operations analytics

(Published on behalf of Pranit Prakash)

In my earlier post, I discussed leveraging IT operations analytics for intelligent IT operations. In this post, we will discuss some differentiators that can help your organization select the right tools for adopting IT operations analytics.
A Gartner prediction shows how critical this is: "Through 2015, more than 90% of business leaders contend information is a strategic asset, yet fewer than 10% will be able to quantify its economic value."
With organizations keen on mining knowledge from data generated by IT operations, the market has a wide assortment of vendors in the IT operations analytics space. Many of the tools on offer are narrow-focused, most addressing just one of the many facets of IT operations. Few such tools address event correlation leading to optimized application performance, or configuration analytics to identify any discrepancy in standard parameters. Fewer still have the capability to sift through textual log data to find patterns that can enable proactive monitoring.

Factors for Selection
With a multitude of niche tools and exciting innovations, the IT operations analytics market offers organizations wide choice to harness a wealth of information for intelligent IT operations. The differentiator lies in selecting the right framework and strategy to implement IT operations analytics. Some critical success factors to identify the right tool and partner in line with enterprise requirements include:
• Application use-case
Enterprise IT may need operations intelligence to cover multiple areas and use-cases. However, to start with, the organization must identify the applicable business scenarios to help choose the right tool and technology. For instance, business cases may vary from limited drill-down capabilities into operational issues and lack of visibility into customer experience to revenue erosion from data-loss.
It is rare to find a single tool that offers effective solutions in all these areas. Moreover, several of the available analytics tools have proven case-studies in handling the individual challenges of IT operations.

• Data handling and statistical capabilities
All analytics tools offer algorithms to integrate and analyze data from available sources. On the other hand, a simple logger tool may not be of much use without the capability to churn and co-relate massive amounts of data to deliver insights relevant to decision making. The slice-and-dice capabilities of most of these tools help organizations mine their way to the right results. Organizations need to evaluate the unique point of differentiation for these tools by mapping tool features to their current challenges and business cases in longer term.

• Visual capabilities
An IT ops analytics tool must be smart enough to present information in a consumable format. Basic dashboards and standard graph capabilities are integral to such tools but what really adds value is the drill-down and customization capability of the said tool. Many such tools provide the capability to define role-based access. . This allows organizations to define key success factors that can be monitored on-the-go through alerts and threshold settings integrated with mobile devices.

Getting IT Right
In the evolving and innovative landscape of IT analytics, it is important for organizations to focus on the right strategy that can identify strengths and weaknesses of the tools available in the market, underlying technologies, and accurately assess the vendor's long-term execution and support capabilities.

July 22, 2013

The four knights of service excellence

The saga of service excellence continues!

In my last post, I had mentioned that there are four 'key ingredients' that act as the pillars for setting up the service excellence office that I refer to as SEO. The SEO is the "central owner" that drives sustained and predictable IT operations. A central owner may be a typical Service Manager or Process Manager, within the IT operational universe. The difference being an elevated set of responsibilities entrusted upon the SEO unlike other roles.
There are many examples of the roles that the SEO enacts. These are explained in detail in our paper titled "Creating a DNA of Service Excellence across IT".
Let me highlight here the key ingredients for a successful SEO setup - the four major cornerstones that drive sustained and predictable optimized IT operations in organizations:

 Our first ingredient is "A comprehensive Structure" - this structure needs to be established with certain responsibilities. Defining the SEO's role and responsibility by formalizing a structure where all  stakeholders are clearly identified, is a must. The SEO can be involved in any phase of service management i.e., from Service Strategy phase to Service Operations phase. For instance, the SEO can act as a "Solution Architect", whenever an innovative idea is to be generated, designed and implemented. This role lays a path for the technical gurus to think with a customer centric approach. SEO becomes the architect by differentiating between what clients "want" and what clients "need".  For instance, a client may want shorter 'Time to Resolve (TTR)' as a positive impact for end users. But the actual need is to improve the end user satisfaction. This need could be accomplished by addressing the pain points of its end users and a reduced TTR may be just one step in achieving this. The SEO is responsible for developing solutions that resolve client issues completely. So the SEO can wear any hat and can be accountable for either transformation, stabilization of operation or driving continual improvement. The important part is to freeze on which hat the SEO must wear.

The next ingredient is the use of "Innovative Toolsets" - In my earlier example of the SEO being a Solution Architect, I spoke about understanding the difference between client "wants" and "needs". To differentiate the two, we need to do the required due-diligence. It is through this due-diligence that we get to freeze on an innovative solution that would help address a specific problem.

The third ingredient involves "Meaningful Measurements" - A typical problem in IT operations occurs when IT is not able to link results with the relevant business metrics. The result is a complete miss in communications with the business and the value delivered by IT is undermined.

"Articulation of Value" is the fourth ingredient - As I explained above, we need to articulate to the stakeholder in a language they understand. The success or failure of an IT program is judged on the value it delivers to the business. Without a clear articulation, this value is unclear. The SEO ensures that the metrics are clearly defined and measured, thus leading to a precise articulation of the impact. By emphasizing on the value delivered, the SEO also helps set internal benchmarks and the roadmap for continuous improvement.

What do you think? Let me hear those thoughts! 

July 11, 2013

Of Service Level Agreements and Non-Production environments

(Posted on behalf of Pradeep Kumar Lingamallu)

In one of my earlier engagements, I was asked to review the service level agreement (SLA) of a service provider for a financial organization. Despite the said organization having a large non-production environment, the existing SLA did not have any service levels or priorities listed for this environment. A long round of negotiations ensured that due importance was given to the non-production environment in the SLA.

This is what happens most of the time - When a service level agreement (SLA) is drafted, all the focus is on production support and resolution of the production issues and availability of the production environment.  Neither the non-production environment nor support for this environment is documented properly in the SLA.  This may be largely due to lack of recognition of the importance of the non- production environment, especially in the case of new application roll-outs and training.

A typical non-production environment consists of a development environment, test environment and a training environment. Each one plays a crucial role during new deployments and rollout of applications. However, incidents in the non-production environments are generally not given the priorities that they deserve. Just as any incident on the development/testing environment will have a critical impact on the release calendar, any incident on the training environment will have a severe impact on the trainings scheduled. Application deployments get affected by both release and training of the personnel. Delays in either one of the environments are bound to have an impact on the release schedule.

I have seen this happen in a number of engagements that I have been part of. In one incident, the test servers were down and testing activities were delayed. As a result the entire release calendar was affected - in this case, it was a major release, so you can imagine the impact on business.

In another case, a downtime in the training environment again resulted in a delay in the release since the personnel could not complete their training on schedule. This may appear to be a small incident from a provider perspective, but for the organization, it is a significant delay.

Any downtime in the non-production environment is bound to affect production as well - but this fact is generally ignored to the buyer's peril. By specifying SLAs for support of non-production environments, organizations have an additional safeguard against any unplanned downtime that could affect the quality of service.


(Pradeep Kumar Lingamallu is a Senior Consultant at Infosys. He has over 18 years of experience in the field of IT service management including certifications in ITIL, CISA, CISM and PMP)


July 8, 2013

Service Excellence as a way of life

"Consciously or unconsciously, every one of us does render some service or another. If we cultivate the habit of doing this service deliberately, our desire for service will steadily grow stronger, and will make not only our own happiness, but that of the world at large." - Mahatma Gandhi

Mahatma Gandhi highlighted that excellence was an accumulation of righteous 'habits' and if inculcated will drive greater growth. This is applicable in IT as well.
It is possible to accumulate the right set of habits to drive growth within enterprises. This can be done by setting up dynamic mechanisms that identify, embed and replicate such habits across individual members across teams in an efficient manner.

But who should be made accountable for organizations to focus on Service Excellence? Can this person or entity bring in flexibility in operations to cater to the changing business environments? Can such flexibility be managed and governed? Can innovation be embedded on this path to achieving excellence?
We believe that setting up a Service Excellence Office (SEO), comprising of a dedicated pool of process consultants helps bring in the rigor, focus and accountability that is needed to achieve service excellence. SEO plays a dual role of an internal and an external consultant in the organization:
1 - As an internal consultant, SEO is involved in identifying initiatives or practices that ensures that the project (or program) goals and commitments made to client are achieved
2 - As an external consultant, SEO ensures that solutions deployed are customer centric i.e., addressing customer pain points

SEO focusses on identifying levers for improvement and enablers for change that tie back to business value, so that the progress and effort spent to drive benefits can be measured at every step. The emphasis is on overcoming challenges around demonstration of measurable value to both set of customers - internal and external.

We've identified the four key ingredients which are pillars in establishing a SEO in within the IT organization. These will be explained in the next blog post. 

July 4, 2013

IT Operations Analytics - Tapping a goldmine of data within IT

(Posted on behalf of Pranit Prakash)

Enterprise IT faces a continuous challenge to maximize availability of systems while optimizing the cost of operations. The need to maintain uptime while managing complex tiers of infrastructure force organizations to spend considerable amount of time and money in identifying the root cause of failures leading to downtime. 

According to Gartner's top 10 critical technology predictions for 2013,  "for every 25% increase in functionality of a system, there is  a 100% increase in complexity."

On the other hand, IT operations generate massive amount of data every second of every day. As of 2012, about 2.5 exabytes of data was created every single day and this number is said to double every 40 months.
This data remains in mostly unstructured format in form of log files but can contain a wealth of information not only for Infrastructure and operations but in areas related to end-user experience, vendor performance and business capacity planning.

As organizations face increasing complexity in their IT operations, they are unable to decipher any patterns that could actually help them improve current operations. This is because existing toolsets are largely confined to silos and are unable to analyze and consolidate details - For example, there are toolsets for IT service management, application performance management etc. that do not talk to each other. The result is a mass of unstructured data that if analyzed in detail, could provide valuable actionable information.

And this is where IT Operations Analytics comes in.

IT Operations Analytics enables enterprise to collect, index, parse and harness data from any source as system logs, network traffic, monitoring tools and custom applications. These data points are churned through a set of smart algorithms to result in meaningful insights for IT operations leading real-time trending and proactive monitoring that ensures higher uptime and co-relation across data-sources.

But IT Operations Analytics is not same as traditional BI which deals with structured data only. The real capability of this exciting field lies in two factors -

a. Comprehensiveness to include any amount of data in any format from anywhere and
b. Capability to co-relate the data to provide centralized view at the finest detail

The possibilities of using analytics to gather actionable insights from configuration and transactional data are endless. In my next set of posts I will explore key use cases for operational analytics and how organizations can adopt this evolving technology trend to their advantage.

(Pranit is a Lead Consultant with Infosys with a rich experience in IT infrastructure and consulting services. He is a certified CISA, Certified DC professional(CDCP) and a qualified BSI ISO 27001 Lead auditor)

July 2, 2013

Data center replication - a key step in disaster recovery

The Atlantic Hurricane season has officially started which means that most organizations would have made plans of some kind of disaster recovery for their data centers. This usually involves developing failover centers and backing up data in the event that the primary data center goes down. And this is where data center replication comes in.
As I mentioned in my last blog post, replication of a data center is a preferred option in scenarios where the target data centers will be equipped with identical hardware and OS as that of the source data center - resulting in a complete replication of the environment itself.

In tech-speak, we define replication as a method in which in various infrastructure computing devices (both physical and virtual) share information across data centers so as to ensure consistency between resources to improve reliability and fault-tolerance.
As with any other kind of migration, replication of a data center involves a set of key considerations primarily around the volume and type of data to be replicated and the number of sites to be migrated:

1- Size/volume to data to be replicated, velocity/Speed require replicating data and variety/type of data.
2- Distance from the source datacenter, number of sites/ domains and security policies to be migrated.

The migration itself may be done in several steps. For instance, a data center may be replicated with the same physical and virtual environment and the same storage or the storage may get replicated separately. In either case, the migration process would require proper planning with focus on bundling applications for relocation and managing licenses for both physical and/or virtual environments. There is a lot of potential to reduce both capex and opex costs at this stage. One sure shot way would be to use virtualization to reduce the count of servers and thereby reduce the associated license costs.
The importance of being prepared with DR cannot be stressed enough. A quick check of the news headlines including the Amazon outage in 2011 and the more recent outage of the French government's accounts payable system - Chorus, makes it clear that system downtime can be crippling for any organization. Replication of a data center to create a new DR site offers organizations a plausible approach towards balancing risks with business needs.