Infrastructure management is undergoing a transformation. ITIL can help manage conflicting demands like – “low cost but high service quality”, “ubiquitous access but enhanced security”?

« ITIL Audit - Benchmarking Vs Climbing the Bench | Main | To Lean or not to Lean »

Application of Lean Principles in IT Service Management

Posted by Subbarao Chaganty

Lean has been a successful process improvement methodology adopted across the manufacturing domain that identifies avenues for reducing expenditure through the elimination of waste. I was involved in a transformational initiative at a financial services firm that enabled us to leverage the Lean waste reduction principles in the day-to-day IT Service Management operations.

Event Management received its deserved recognition in ITIL V3 and has branched out as a well defined support area within the “Service Operations” phase of the lifecycle. A few of the critical areas of Event Management that we dealt with were Event Notification, Event Detection, Event Filtering, Event Co-relation and  Event Response.

Application and Infrastructure generated alerts are means to grab the attention of service desk agents who then take pre-determined action based on quick analysis.  In scenarios where the same service desk agent is required to manage customer oriented incidents, service requests AND alerts -  it is often the case that critical alerts are missed leading to a loss of revenue and credibility for the organization.  The multi-tasking and concentration abilities of the agent become critical in managing demand from customers and the systems (events) – I don’t think this is a sustainable approach for supporting mission critical banking applications.

Another issue that I’ve seen prevalent in Event Management is the “calling wolf” syndrome – where insignificant alerts keep crowding the monitoring interface and challenge the service desk agents ability to prioritize and respond to critical events and incidents.

We embarked on a systematic transformation by driving focused “Cleanup” & “Automation” initiatives that reduced the overall alert volumes by 44%. We did this by effectively categorizing alerts into waste categories. We then followed the analysis and removed redundancies or “Waste” alerts. We also recommended that a continuous focus  be maintained on the accuracy and “Leanness” of the Event Management process, for example – service desk agents should be empowered to “tag” spurious or suspicious alerts directly in the monitoring interface for additional scrutiny, analysis & action.

I’m sure that most of the Service Operations process areas like Incident, Configuration, Change and Release Management can leverage Lean Principles, in addition other service lifecycle phases also have areas where Lean can play a significant role in improving the effectiveness and efficiency of IT Service Management.

Based on this transformation journey - Rohit Nand (Principal Consultant) from our group presented this case study (presentation available below) at the esteemed iTSMF UK Conference held in Nov 2008 at Birmingham, UK under the theme of “Driving Real Value”.

I also want to highlight that this transformation not only enabled Lean Event Management operations but also opened the doors for consolidating “silo & vertically organized” Monitoring teams into a “Shared Horizontal Services Model” .

TrackBack

TrackBack URL for this entry:
http://www.infosysblogs.com/ITSM-service-matters-mt/mt-tb.fcgi/94

Comments

Hi Subba,
Excellent insight into how ITL and Lean concepts can be synchronized with each other.

On "Event management"- I feel that a consistent storm of events in a monitoring tool is primarily due to two reasons (not to mention the badly configured “event suppression”):
1. Organizations tend to use all monitoring scripts available in the out of the box installation of the tool, which also generates events; not disabling what they do not need. For this reason, I like the “per monitoring parameter” licensing model of some vendors, like Mercury Sitescope, so that we would monitor only what we really have to!
2. Not differentiating between parameters that are used for trending and those used for real-time monitoring. Trending metrics are also configured to generate events- C’mon, I don’t need an event to inform me that the server CPU utilization has crossed warning threshold of 70% for 3 seconds! I would rather look at the performance trend reports for this than an event, and alerts should be more or less confined to “Availability monitoring”.

In any monitoring tool implementation, there is always room to cleanup at least up to 70%:)

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.