Exception Handling, Error Handling, Fault Handling are different synonyms for the same concept. It’s relevance never decreased with increased usage of technology. It is probably the most necessary thing in any branch of IT in a parallel stream and can create problems if not conceptualized properly.
In another context in the branch of ‘electronics engineering’ Noise is considered a factor in amplification calculations (it is a necessary evil!) since if it becomes ‘0’, the signal strength will be infinity which is out of control and non desirable.
Similarly, Exception Handling is something which only increases the stability of any implementation. There is no doubt that errors cannot be done away with. They just need to be handled properly.
To define it simply as mathematical theorem:
S ∝ E where S stands for measure of Stability and E for robustness in Exception Handling (As E increases, S improves substantially).
Robustness of ‘E’ unfortunately does not depend on a single line of thought. A multi-dimensional approach to comprehensively handle the exceptions are the means to the achieve the goals.
In simplistic terms the dimensions are:
E ∝ WHY the need for exceptions
E ∝ WHEN to do it.
E ∝ WHAT next after the exception is raised
E ∝ HOW to manage the exception comprehensively.
This grid again can take multiple forms such as below examples:
1) In most of the current IT implementations based on SOA, synchronized services, multi-tiers are the trend. One of the simple yet popular pitfall is the use of attribute TimeOut (time that a service consumer should wait before it raises an exception). In a multi-tier steps of process flow, isn’t it common sense that the initial consumer cannot have a lesser time out value than the consumer which is acting as a producer down the chain?
An example of wrong chain looks like this: Service 1 (60 second timeout) > Service 2 (120 second timeout) > Service 3 (30 second timeout)> Service 4 (no timeout)
There is a need to unwind gracefully with an agreement between the layers to avoid ridiculous exceptions. Usually this happens when a change is made to one of the components in between with the principle that it is a loosely coupled design not affecting other applications - think again on what is the indirect coupling on the exception handling part.
2)Revisiting the multi-tier SOA based architecture, with an example of user element such as a presentation layer that interacts with multiple applications through services, there is a need to present a user friendly response to the user who has acted on request and waiting for a response. StackTrace, Technical errors sending to the user would be frustrating. Hence a need to define a Error Schema, intelligent orchestration of exception handling such as aggregation of information if the handler is dealing with multiple errors from dependent services, transform the technical errors and provide a understable message to the user but log all the technical details in the error handling and exception store are required.
3)Lets say a customer on retail website tries to buy a gaming system and additional 2 CDs for the games (the package is having a discount). The retail website in this case as it tries to request to multiple vendors (one for gaming system, other for the CDs) and it gets an success for the gaming system but gets an error for the CDs - how does it respond back to user in timely fashion. Does it say, the order failed in sync, or does it respond success partially and failure partially as multiple async responses. Does it have an alternate flow that handles the exception functionally and clears the exception and ultimately completes the whole transaction Successfully?… In a nutshell it is a simple “Partial vs Full” case but the path is exception handling flow instead of a sequential update flow.
4)Sometimes, errors that are not critical can still cause a problem if left unattended. Small errors could just bubble up to massive errors and hold the infrastructure to ransom. An example is lets say a high volume transactions asynchronously sending messages to subscribing application but this app anyway has a feature to respond with an acknowledgement asynchronously that is optional and but not turned off. If the ack service fails, and application just creates a huge number of errors due to the sheer volume, the situation becomes unstable when system resources gets consumed for supporting the volume of errors affecting the normal business though it was a non critical exception to start with.
5) Everyone knows that there need to be different levels of logging, exception and alerting. But, do we need different strategies for exception, logging and alerting or even auditing. Since the information is same, the same datastore can be used with the levels of errors and logging clearly defined at each component such as Info, Debug, Warn, Error, Fatal. It would be easier to even audit not just for exceptions but for successful messages as well. A reporting layer if further can be created easily on such a data store. Business Intelligence can be created on top of this.
6) What are the errors that can be handled within a scrip of code and what needs to be propagated outside of it? For e.g. a simple try-catch-final block does a set of instructions. Should it just log the situational awareness information, IGNORE and go to the next step or should it ABORT and return back to the flow from where it started. These are common choices and need to be understood even when the initial happy flow analysis is being done.
Basically, exception handling cannot be separated from the functional requirements and even non functional aspects. Each step of the software lifecycle starting from conceptualization till support should develop the framework and improve the maturity of it.