Infrastructure management is undergoing a transformation. ITIL can help manage conflicting demands like – “low cost but high service quality”, “ubiquitous access but enhanced security”?

« Potential for SaaS Based ITSM Consulting | Main | How Much is Too Much for an ITSM Tool customization? »

The Next Big ITSM Evolution (part 3) – Spot the difference

In my first posting on this topic I highlighted the trend of customers who wish to enhance the maturity and efficiency of their IT capabilities are looking towards their development and test infrastructure. The next part then looked at why such infrastructure tends to be built as an ad-hoc deployment rather than a service.

In this post I wanted to, by example, highlight some of the differences of building a pre-production environments service in contrast to a Production service. I want to take the example of measuring application/service availability, a critical measure of the quality of service to clients.,,,

There are three core challenges with capturing this kind of information in a pre-production Environment Service context. The first and most obvious is separating platform availability from application availability. In a Production environment there is little differentiation between platform and application as far as the users are concerned, the focus is the availability of the end to end system. In a development and test scenario it is critical to separate the two, because fundamentally it is quite possible that a given deployment of an application will actually not work or break during operation, especially in the early test phases. This means that one of the first activities in Incident Management is to determine whether there is a fundamental software code issue or platform issue. A best practice that I have observed is for the Environment Service organisation to take accountability for determining the source of an incident and referring accordingly, much like a traditional service desk, with an oft occuring characteristic that an Incident can be referred back to the client for a fix. The key point here is when the issue has been identified and then confirmed by the client as an application code defect, any service downtime is not attributed to the environment. This differentiation also helps to make a classic tension point between software projects and infrastructure teams transparent, so that the real issues can be identified and fixed.

Secondly ITIL availability is often measured as total user time of a given service, minus total user downtime. In a Production environment user numbers are relatively large, stable and well defined, so total user hours are easy to define to a reasonable level of accuracy per month. In a set of test environments, the user base is not stable and can change from day to day. The location of test users can also change. This makes useful Key Performance Indicators (KPIs) of this nature very fickle and hard to measure in a meaningful way. For example, an environment might be down for 5 days, but if there are no test users working on it, does it actually matter? Is it useful to report that a given environment is down for 5 working days out of 20 (i.e. 25%) that month? What does it tell management? The key is the impact on users, which as described is difficult to track, so management need to recognise this when setting targets and be careful not to set those which might skew the service offered to the users. One solution to this challenge might be to get the clients themselves to fill in a simple web based form that tracks user numbers in a given environment, this can then be used to calculate service availability.

Finally, there tends to be a much higher rate of scheduled downtime for code deployment and data refreshes in a development test scenario. This ‘downtime’ should also be excluded to the point it is delivered within an agreed timeframe. If deployments fall outside of this schedule, then it should be counted as service downtime. It is important, however, to measure the deployment times and seek efficiencies, as often deployments can be very manual and limit the useful time on an environment, which can cause delays to projects. A good target in this area would be to monitor on-going deployment times per application and environment class and seek incremental improvements.

A related topic is how to assess the priority of an incident, as just like the user base changes, so does the priority of certain applications and environments as per the project requirement. This is a topic for another day, but in the meantime I would invite readers to share their experiences around the specific challenges of a test environment service…?

TrackBack

TrackBack URL for this entry:
http://www.infosysblogs.com/apps/mt-tb.cgi/1288

Comments

Absolutely right! Am working on building an ITSM Process for a Test Environment and I cannot agree better on this.

I recently started as the Service Desk/Incident Manager for the Pre-Production Environment organization of a large company; an environment of numerous apps, platforms and tools. While were struggling with the idea of adapting V3 service management practices in the environment, one of the most visible issues under discussion is incident prioritization. You had mentioned the idea of taking up that discussion. Have you given this more thought?

Thanks,
Paul

Hi Paul,

Thanks for the comments. I do have a number of thoughts on Incident Management for development and test environments and specifically around prioritisation. Hopefully over the next couple of weeks I can post something on this topic....

Regards,


Bruno

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Infosys on Twitter