Is your workflow becoming a bottleneck in addressing production issues on time?
In one of my interactions with a client manager, we were discussing about ways through which the development team can respond faster in the event of a performance problem occurring in production environments.
As with any exercise, understanding and analyzing the root cause of a performance bottleneck begins with collecting all relevant information that would assist the development team to re-create and troubleshoot the problem scenario in development environment.
A most trustworthy, and at times, the only source to identify root cause of any production problems are the logs generated by servers across different tiers. For instance, a web server log could yield valuable insights on application's workload helping the development team to re-create production workload in test environment. Application server logs would provide insights on transactions and any exceptions that triggered the problem. Thus, it's imperative they are made available to development team.
Now, the real challenge seems to be this - how sooner can the development team gain access to logs? As with most organizations, the production environment and logs are managed by infrastructure team. For any problem analysis, the development team needs to request logs for required duration from infrastructure team. The existing log management practices, security and privacy regulations, and approval channels could induce certain unavoidable delays in delivering these logs. Nevertheless, the maturity of your processes is weighed by how these delays can be optimized. How much time does the development team need to wait before they get to see the logs - is it in hours or days? Are your operational processes becoming a bottleneck here?