Enterprise Fire Fighting Cell
This write up is a continuation from the blog titled 911. When these different cases are dealt with, they are pseudo named as “fire-fighting” even in technology world because they are fires trying to engulf the IT infrastructure and if not stopped will burn up a lot of $$$. The operations cost of IT today range between 50% to 75% of the overall IT budget and to reduce this cost overhead and optimize IT resources, it is important that many such sporadic issues and cases are not masked with tactical solutions and allow the ‘fires’ to turn up again and again. The loses due to such issues are not directly accounted but surely affects.
In the world we live, any fires that occurs at a home or an industry is dealt by Fire fighting department and they don’t stop by just stopping the fire, they investigate till they find the root cause and advises steps to ensure it doesn’t happen again.
In IT though there are no dedicated firefighters; instead the systems integration development and maintenance team put in extra hours and even days or weeks of hours and finally get a workaround to contain the spread and extinguish it or if lucky (often rare) find the root cause. Technology is no different from non-technology and in fact most of the principles are a reflection (refer Analogy).
The purpose of this write-up here is to present an abstract concept for enterprises to create a dedicated team as “Fire Fighting Cell”. The team composition of this firefighting should be such that they are experts in handling such issues and focused in getting to the root cause no matter what the level of issue is. Unfortunately there is no training programs to develop such experts and enterprises would need to look into experience and skills of their employees to form such a team.
For e.g. look at case 1 and 4 again, most probably the support team would have found a way to still proceed with the go-live plan by restarting the services and components or by reducing the overall components by some number to save face for the management and think of addressing the issue later. All fine for now, but the issue still remains there and the risk also is not mitigated for the long term. On the contrary if the fire fighting team takes over in response to the 911 call, their job would be to analyze in depth even looking at the original non functional requirements, the configuration options, the capacity plan (utilization of system resources) etc and even review of the code to understand the deviation from standard practices. They are fighting against the time just as in real fire-fight when you have to contain the fire and extinguish it. Similarly for Case 2 and 3 the fire fighting cell has a job of finding what went wrong over a period of time - from requirements elicitation to execution and reporting the A-Z results of the gap analyses.
The concept of having a dedicated team is not about having a different set of members. It is about having a different entity and structure but can contain the existing developers and support engineers, an architect, quality test experts, an effective manager etc. This team when in action should be leaving aside their current responsibilities and be part of fire busters.


