Infrastructure Services are definitely undergoing a major transformation. How does one navigate the web of emerging technology trends and stay ahead of the game? Read on to learn more on our Infra Matters blog.

« July 2013 | Main | October 2013 »

September 26, 2013

7 steps to a smarter IT Front End

 

We often praise a particular front end as compared to another. The world of Graphical user interfaces has transcended from PC's to Mac's to smartphones. But quite often the IT department ignores the 'Front End' that the modern user expects from IT. Most are fixated on the Service Desk as that all empowering front end. Even ITIL has prescriptive definitions. One can argue that this is not at all the case especially from an end user perspective.

 

We often hear the complaints of IT being slow, ineffective or behind on support commitments. Though there may be some truth to this, there's much to do with ignoring perceptions that have built up over time in user's minds. So what is that 'Front end'- I would define that as a cohesive combination of Resources, Service Desk response times, average speed of resolution, automated Service Catalog and a comprehensive Knowledge base.

 

So how does an organization build up that smart IT front end? Here are 7 steps to get going-

 

1)     Handle all actionable Service Requests through a single service catalog- Basically 100% of Service Requests should go centrally into one service catalog. Insist that the service should not exist if it does not exist on the Service Catalog! Obviously this requires a major change to sunset all kinds of tools and manual services, but the effort to consolidate on one clean interface is worth the time and effort.

2)     Support the Service Catalog through an automated back end - All actionable Service Requests should flow through an automated back end working their way through approvals, procurement, provisioning and fulfillment. Of course automating all of this is ideal and the holy grail! But make the move towards that goal and measure progress. Again shoot for 100% of backend processes; you will reach a high mark. E.g.-new user accounts, requesting a development environment, licenses, adding application access etc.

3)      Enable Problem to Incident (P2I) conversions- Resolving a problem is not the end of the day. Confirming that Level 1 teams understand what to do if the incident rears up again is a must. Consistently enforcing this policy of P2I connection and conversions will work wonders over a defined duration resulting in more incidents resolved faster and efficiently at Level 1 itself.

4)      100% self service for user induced incidents- Setup a Self Service gateway to manage all such common incidents. This will dramatically reduce time to improve speed of response. Examples include Account Lock Out, Password changes and resets, information /document upload, profile changes etc.

5)     Setup and maintain a corporate Wiki- Information discovery and ease of information consumption should play a key role in the roadmap of the IT Front end. Too often we see lack of information on how-to's, problems with finding the right document and obsolescence. An annual check on all key docs, along with the user's ability to edit and update docs will foster a sense of shared ownership within the user community. Enable access through all devices, especially smartphones. Experts will bubble up to the top and become allies of IT.

6)     100% of software installs via End users- through the self-service capability and service catalog automation, enable users to receive a temporary download link to software that they are allowed to install. In the long run, diminish the need for this install capability through adoption of Software as a Service and/or internal web applications. E.g. - Office 365, Sharepoint Online and Lync

7)     Periodic user engagement- IT often gets flak for not being there when it matters or simply not being around. Enabling user feedback, technology awareness sessions and formal internal training periodically can go to a great extent in bringing IT closer to the business community.

 

The organization of tomorrow requires a smart technology front end. Transforming from now to then requires investment of time, effort and resources. These steps can get you started. And there may be more. Do you have a take on additional steps- then do write in.

September 24, 2013

Service Redundancy - 5 Lessons from the Nirvanix blowout

 

Earlier last week one of the most prominent Cloud Storage Providers Nirvanix, gave a shocker to the tech community when it announced that it will be going out of business. Customers and partners were asked to stop replicating data to their storage infrastructure immediately and move out their data in about 2 weeks. I have been fascinated about this story and here are the facts.

-          Nirvanix pulled about $70 Mn in venture funding during its lifetime- starting Sept of 2007

-          It's key backers kept up 5 rounds of funding right upto May of 2012

-          Rated well by industry analysts and media

-          The cloud storage service was sold through several enterprise software resellers and service providers.

-          The service pulled in several key customers and was challenging the likes of Amazon's AWS S3 storage services.

 

What is evident is that the company was burning cash faster than generating revenues and it all came to an abrupt end, when it could not find any buyers/execute an exit strategy. One would have thought that enough value (IP or otherwise) would be generated to a potential buyer in 6 yrs of existence, but no more detail seems available. Nirvanix is by no means the first to go belly up- EMC had pulled the plug on Atmos Online service in 2010, though that was perceived as far smaller in impact.

 

From the enterprise standpoint, if an organization had been using their services, these 2 weeks are a time for scramble. Moving data out of the cloud is one tall order. Second issue is to find a new home for the data. And what if the data was being used in real time as the back end of some application. More trouble and pain. So here's my take on the top 5 areas clients/providers can address for Service Redundancy (leaning more on Cloud storage services)

 

1)     Architect multiple paths into the cloud- Have a redundant storage path into the cloud. Ie host data within 2 clouds at once. Now this depends also on the app that is using the service, geo and users, a primary/ secondary configuration, communication links and costs. For eg- a client could have an architecture where the primary cloud storage was on Nirvanix and the secondary on AWS. Throw in the established value from traditional in-house options and established disaster recovery providers.

2)     Be prepared to move data at a short notice- Based on the bandwidth available from source to target and the size of data in consideration, we can easily compute how much time it would take to move data out of a cloud. Add a factor of 50% efficiency (which could happen due to everyone trying to move data out), frequent testing and we have a realistic estimate of how long data migration will take. Now given that the 2 weeks from Nirvanix is a new benchmark, clients may choose to use this as a measure of how much data to store in one cloud- ie if it takes more than 2 weeks to move important data, then consider adding communications costs for better links or including a new provider into the mix.

3)     Consume through a regular service provider - Utilize the Service through a regular managed services provider. This has two benefits for clients a) the ability to enter into an enterprise type contract with the provider and ensure service levels b) the ability to gain financially in case of breach of service levels. Of course service providers in turn have to be vigilant and ensure that they have proper flow down provisions in their contracts with cloud providers and secondly there is an alternative to the service in case of issues.

4)     Establish a periodic service review process- Often we see the case of buy once and then forget. A regular review of the Service will often provide early warning of issues to come. For eg in the case of a cloud storage provider, tracking how much storage they are adding (new storage growth%) and new client labels signed on will give a good indication of any issues on the capacity front. This may in turn point to lack of available investment for growth.

5)     Understand provider financials- Cloud providers today focus on ease of use and new gee-whiz features- this often masks how exactly they are doing financially. And the market continues to value them on revenue growth. It does not matter whether the company is public/ private but as a client and under confidentiality, there exists a right to understand the financial performance and product roadmap, even if it is as a high level.

 

Cloud Solutions offer multiple benefits, but at the end of the day, they still serve another business as a service and there are issues with all services- cloud based, or traditional. Was Nirvanix an outlier or the warning of things to come? We don't know that yet, but as service providers and clients, this may be the 'heads-up' we need, to stay focused on the essentials for effective transformation of Infrastructure and Storage Services.

September 19, 2013

Does your customer know what you do?

(Published on behalf of Praveen Vedula)

 

IT departments are constantly engaged in a battle to provide quality services to their customers. In such a case, the communication regarding these services is equally important given the high stakes involved.. With the advent of SaaS based tools like ServiceNow, automation has been the mantra for the management of various processes. Communication regarding service outages or application downtimes to IT & business stakeholders is one such area that can be improved using the automation engine. Every organization has different communication needs depending on their core business.

If IT is to be considered as an enabler of business instead of a cost center it needs to act as any service provider would- ensuring its service levels are articulated  with no room for errors. So it goes without saying that any miscommunication or improper communication could impact the business for mission critical services.  There have been numerous instances of big outages which could have been averted through effective communication to the service desk.

Okay, so we know how important it is to communicate - how do we go about it?


The key focus of most ITSM tools has been in the area of the traditional ITIL processes such as CMDB, incident, problem, change and release management - predictably, communication based solely through these modules leaves a lot to be desired.
SaaS based ITSM tools have been the game changers when it comes to integrating the communication required for ITSM processes. Thanks to flexible tools like ServiceNow, an integrated communication model can be envisioned which caters to special customer requirements in relation to ITSM processes. The flexibility to package the communications and showcase on the IT portal dashboard has been an amazing output of automation.

 
As a consultant, I witnessed the potency of this approach in a recent engagement. For a client, pre-defined communication templates were used to integrate the communication for incident, release and deployment management processes for both unplanned and planned outages. This made it very simple for the communication teams to just enter the details in the incident or release and propagate communication on click of a button. The templates also had the ability to communicate on an ad-hoc basis through IT portal when details of an outage are not recorded in an incident or if the mail servers are down.

This was an interesting challenge as the application subscription data was not available readily. In order to drive the communication objectives, it was important to provide a platform for the IT and business users to subscribe to applications for communication and to be kept informed about weekly maintenance and infrastructure maintenance updates. With this in mind, the result was a dynamic visual dashboard that showed the summary of various outages, be it unplanned or planned; while also enabling a personalized email communication based on application subscription data. This was a great exercise on how the integration of ITSM processes for communication can be managed when subscription data is scattered around various tools and repositories. This reminds me of another intriguing topic to write about in my next post i.e. application access role data management and its implications on ITSM processes.

(Praveen Vedula has over 7 years of industry experience. He specializes in ITIL best practice oriented process designs and implementation)

September 16, 2013

Analytics in IT Operations - Defining the roadmap

(Published on behalf of Arun Kumar Barua)

As the speed of business accelerates, it is a lot more critical to have visibility into IT operations. However, getting that information in a form that you can use to direct faster and more informed usable decisions is a major challenge. Visibility into operations is one thing but turning massive amounts of low level, complex data into understandable and information useful intelligence is another. It must be cleansed, summarized, reconciled and contextualized in order to influence informed decisions.

Now let's think about it, what if organizations are able to effortlessly integrate their data; both structured and unstructured data within the organization? What if it were easy and simple for businesses to access it all? Think of a situation where this data acquisition process is predictable and consistent! Business insights is linked to a framework for quick decision-making and made available to all who require it.

In a previous post, we looked at the importance of the data that is generated on a daily basis through IT operations. Recognizing the importance of these data and analytics is essential but putting in place the processes and tools needed to deliver relevant data and analytics to business decision-makers is a different matter.

Predictive analytics is not about absolutes, it encompasses a variety of techniques from statistics and the use of machine learning algorithms on data sets alike to predict outcomes. Rather, it's about likelihoods. For example, there is a 76% chance that the primary server may failover to secondary in XY days. Or there is a 63% chance that Mr. Smith will buy at a probable price, or there is an 89% chance that certain hardware will be replaced in XY days. Good stuff, but it's difficult to understand and even complex to implement.
It's worth it, though. Organizations that use predictive analytics can reduce risk, challenge competitors, and save tons of money on the way.

Predictive Analytics can be used in multiple ways, for example: 

  • Capacity Planning: Helping the organization determine hardware requirements proactively and a forecast on energy consumption
  • Root Cause Analysis: Detecting abnormal patterns in events thus aiding the search exercise on single-point of failures and also mitigating them for future occurrences
  • Monitoring: Enhanced monitoring for vital components which can sense system failures and prevent outages

The selection of an apt tool will enable you to use reports and dashboards to monitor activities as they happen in real-time, and then detail them into events and determine the root-cause analysis to realize why it happened. This post talks a bit more about the selection of such a tool.
By identifying various patterns and correlations with events that are being monitored, you can predict future activity. With this information, you can proactively send alerts based on thresholds and investigate through what-if analysis to compare various scenarios.

The shortest road to meaningful operational intelligence comes by generating relevant business insights from the explosion of operational data.  The idea is to transform from reactive to proactive methods to analyze structured & unstructured operational data in an integrated manner. Without additional insights. it is likely that IT management will continue to struggle into a downward spiral.

Now would be a good time to tap into the data analytics fever and turn it inward. 

 (Arun Barua is a Senior Consultant at Infosys with more than 9 years of diverse experience in IT Infrastructure, Service and IT Process Optimization. His focus areas include IT Service Management solution development & execution, Strategic Program Management, Enterprise Governance and ITIL Service Delivery.)

September 12, 2013

The palest ink is better than the best memory

(Published on behalf of Vishal Narsa)

Enterprises operating multiple testing environments often fail to realize the need for a comprehensive knowledge management solution which can cater to the information needs of the users and support groups for the test environments or non-production environments as they are called. As the effort and money spent on building these complex test environments is exorbitant, it makes complete business sense to quantify the value extracted from these investments.

One of the principal criteria for quantification is the availability and uptime of these non-production environments for the testing teams. Any downtime on non-production environments will lead to the releases getting delayed and testing resources being underutilized - a serious implication for the business itself.

Consider the case of the testing environment for a systems integration project wherein there are multiple teams jointly responsible for building and testing the entire business solution. The stake holders range from applications to infrastructure teams, from environment architects to development teams, 3rd party vendors to testing teams. It is not surprising, that the environment information exists in silos given the diverse set of teams involved.

Any downtime on an integrated testing environment means that an individual or a group of components malfunction and this hampers testing of the complete business functionality. Restoration of this environment needs a coordinated effort by different component teams involved.
But a lack of comprehensive knowledge base of the environment landscape springs up as one of the major challenges at this juncture. The component teams often tend to depend heavily on each other's personnel for their respective component configuration information and other technical details about the environment. This information is absolutely critical for the component teams and their ability to restore the environment.

Lack of a consolidated environment knowledge base leads to a delay in restoring the environment which in turn, has a far reaching impact on development/testing schedules, release schedules resulting in unplanned costs to the business.

Given the criticality of knowledge management and the positive influence it can have on achieving overall business objectives, it is important for organizations to deploy a standardized and well governed methodology to capture and reuse environments related knowledge assets. In organizations with mature ITIL practices in place, the environment knowledge base can be incorporated as a subset of enterprise Service knowledge management system (SKMS). Some of the palpable benefits of a matured knowledge management system include:
• Accelerated application delivery contributing to reduce time-to-market
• Reduced environment provisioning time due to readily available baseline configuration information
• Early resolution of environment incidents leveraging past knowledge
• Improved staff productivity due to enhanced knowledge sharing

The title of this post says it all - Knowledge written down and captured in a proper format will always be more accurate and valuable than referring to a collective organization memory.

September 3, 2013

Testing the test environment - Infrastructure testing for non-production environments

(Published on behalf of Divya Teja Bodhanapati)

In our previous post, we looked at the perils of ignoring the non-production environments and focusing only on production environments.
In order to achieve a reduction in the total costs of operations, an optimized, robust, and reliable and "fit-for-purpose" non-production environment is essential.

The question is when can an environment be called "fit-for-purpose" or "reliable"?

The answer is "when all the components (infrastructure, middleware and the applications) involved in the environment perform as per the user defined requirements."

When we look at the largest component i.e. infrastructure, 3 elements stand out - Storage, Network and Computing components. Testing applications is a well-established function but how do we ensure the underlying infrastructure is also working as required?

Not many organizations have given a serious consideration to test their infrastructure before putting them to use. Over the past years, it has been observed that the outages and downtimes in environments are primarily due to infrastructure issues.
The Quorum Disaster Recovery Report, Q1 2013 says that, "55% of the failures are at hardware level (network and storage components)". This is not surprising.

In July 2012, micro blogging site Twitter had to post this message on their blog after a short downtime blaming an "infrastructure double whammy". This outage affected millions of Tweeters across the globe; a day before the Olympics was to begin in London.

But what is the impact of downtime in non-production environments?

Any system downtime will end up shortening the development and testing cycle as these processes will have to wait till the environment is up and running again. Due to insufficient time, the development and testing stages may not be conducted properly leading to a vicious cycle of possible defects in the final product. This would ultimately result in outages in production environment as well - the consequences of such as outage can be even more devastating to the business as seen in the Twitter case above..


Infrastructure testing essentially involves testing, verifying and validating that all components of the underlying infrastructure are operating as per the stipulated configurations. It also tests if the environment is stable under heavy loads and different combinations of integrations and configurations.
Infrastructure Testing includes all stages of software testing - unit testing, system & integrated testing, user acceptance testing and performance testing applied to the infrastructure layer. By bringing in the rigor of testing in the infrastructure space, infrastructure testing eliminates inconsistencies in the infrastructure configurations that are the main cause of outages and downtimes in non- production environments.

A thorough testing process is crucial to the success of any rollout - by testing the underlying infrastructure of the non-production environment, the probability of any defect due to incomplete testing can be safely ruled out.

 

(Divya Teja is an Associate Consultant at Infosys with close to 4 years of experience in the IT industry. Her focus areas include Non-Production Environment Management and Infrastructure Automation.)