911
Lets look at some typical cases that can be classified as an analogy to “Fire alarm” encountered in course of a software implementation.
Case 1:
A large programme in North American Energy corporate is scheduled to go-live phase-wise deploying 70 services connecting 15 applications and functionally connecting 8 business lines as part of a business optimization exercise. While sanity testing a App server hosting a new application connecting most of these interfaces in the 4th weekend of the schedule 7 weekends, it is observed that the application is not getting connected due to random errors. (Authentication failure, Null pointer exception, memory unbounded, etc.). None of these errors are reproducible in any of the Pre production environments. On top of it the senior management is hours away from a decision checkpoint meeting and this issue impacts the plan for the scheduled go-live. Dial 911
Case 2:
A business transformation programme is envisaged by a unit of large bank in UK to automate the trading platform use BPM technology for the overall Trade-Accounting-Settlement. An RFP is floated and System Integration service providers provide their best estimates to get the deal. A certain vendor beats all its competitors on the cost front and grabs the deal on a fixed price quote. After the start of engagement, the company as well as the vendor realize that they had drastically underestimated and are in a serious risk of missing the delivery schedule. They estimated for only one BPM flow (it is a single business process but missed the deeper flows and complexity as will as integration with multiple applications and systems). Dial 999.
Case 3:
A financial institution embarks on improving their existing risk management process and predictability to support the main business processes. The leadership envisions better productivity of their business resources and lesser risk deals. This is laid out to IT leadership and enterprise architects who come up with an overall process for a certain cost and timeline. The details are percolated down in the Organization pyramid and ball gets rolled on the project. The project is delivered as well with minor glitches after following a requirements-design-development-test-deliver cycle. But the business users find seldom use of this improvement and infact see that it is more slower and providing incorrect values thus failing in providing the risk management cover that it was meant for. They now want to roll back to their previous solution. Dial…
Case 4:
One application maintenance team had put in a major release in production couple of months back upgrading the existing services and so far there has not been any cause of concern. But in a matter of few days, unexpected issues such as frequent crashing of services, delay in bringing up the applications causing serious disruption to the overall business. Under stress since all dependent applications and senior management are closely tracking this, the support team tries to attack the symptoms, tries failover to disaster centers, attempts a clean shutdown and start and still find that the issues are getting regenerated again and again on a daily basis. Putting in more hours of work is not resulting in any peace of mind…
Taking a pause here and will come back more on this…


