Next Gen Infra Monitoring
A dedicated Infra Monitoring group for the enterprises can essentially act as the first line of defense for all infra outages. The team ideally should be equipped with an intelligent infrastructure monitoring landscape (achieved via 3rd party tool automation) which not only monitors alerts at an element level (servers, networks, DBs) but also correlates the alerts to pinpoint the failure origin. The alert thresholds can be set up by the team to proactively detect issues. As an example - The server alerts can be setup at 65% utilization. When the utilization crosses 65% , the Infra Monitoring group gets an alert. The team analyzes the alerts and attempts to fix it (using SOPs, Knowledge documents) before reaching out to L2 and L3 Infra teams. This way the team has made an attempt to fix a potential issue proactively.
The Infra monitoring group is tightly coupled with the Service desk and all major outages, maintenance windows are proactively communicated across. Another important function the Infra monitoring group can perform is that of a command center. So, essentially whenever there is a critical or a major Infra incident, the Infra monitoring group can take up the end to end ownership of the ticket- which will involve opening conference lines, initiating major incident process, communicating the status of ticket to all key stakeholders, reaching out to L2, L3 POCs , 3rd Party Vendors etc. This way the tickets are not lost and there is complete accountability and visibility in the system.
The Infra Monitoring group can be a 100% remote team. It is a combination of Eyes on the Glass and basic triaging skills. Solution fix is driven through SOPs and anything more detailed or technical is passed on to L2 and L3 teams. The Infra monitoring group structure ensures that expensive and skilled L2, L3 resources are utilized in an optimum way. This team can also be shared across several clients (permitting client permission) and is cheap and scalable way of improving customer satisfaction.
The success of this model is heavily dependent on successful setup , configuration and rollout of the monitoring tool stack. This has to be handled as a separate IT program and should be driven from the top (client side). The tool automation can also enable rollout of business dashboard
(mapping the health of IT infrastructure to Business functions) as well as tie with other monitoring groups like NOC (Network Operations Center), SOC (Security Operations Center)
Posted on behalf of Thomas Mathen