Performance engineering in Agile landscape and DevOps
Author: Aftab Alam, Senior Project Manager
Over the last couple of years, one of the key shifts in the software development process has been to move away from the traditional waterfall approach and instead, embrace newer models like DevOps. One of the main goals of development and operations (DevOps) is to monetize investments as soon as possible. In the traditional waterfall model, UI mockups are all business owners (investors) have access to, before agreeing to invest.
In fact, it can take up to four months to create something tangible that they can see and feel. Sometimes, the end results are not what they might have expected, or in some cases, market scenarios might have changed. Teams thus end up working late-nights and extra hours to cope with the new requirements in order to deliver something that is useful and that adds value to the business. The results are a frustrated team and disappointed investors.
An agile and DevOps software development process is the key to solving this problem.
Key objectives of DevOps are to:
• Reduce the time between feature concept / discovery and actual release in production (going from months to weeks / days)
• Minimize disruption in production (downtime / other issues need to be resolved in minutes)
Key enablers of DevOps are people, organizational culture, extreme automation, right tools and frameworks, flexible application architecture, and knowledge-sharing. While teams change the development process and delivery model from big bang to small, incremental delivery / change, we cannot plan for production as if it is a lab setup and simulate production behavior for every release. Performance engineering must also use DevOps principles, adapt to the change, and be a part of the DevOps model. I would call this 'PerfOps' (Performance engineering in DevOps).
Transition from Waterfall SDLC to DevOps
A crucial change required to fit performance engineering into the agile model involves shifting performance engineering to left(toward development) and right(in production) in the software development life cycle (SDLC). This blog will discuss what PerfOps is, the challenges faced in PerfOps, the means to overcome these challenges, and also key PerfOps activities.
PerfOps starts at the very beginning of the product/ feature discovery phase and continues in production. Here, everyone needs to work towards a common goal to deliver a high performance enabled application.
Where PerfOps lies in devOps
Four key challenges that need to be overcome include:
- People and cultural barriers
- Enable everyone owns performance in DevOps (Performance awareness)
- Short release cycles
- Explicit performance requirements (It should be a functional requirement throughout the SDLC)
- Developers need to have the means to validate performance before check- in the source code for build
- Mine the performance API from unit and integration tests automatically
- Performance environment setup, design, testing, execution and validation has to be completely automated.
- Identifying valuable tools and frameworks
- It is essential to select application performance management (APM) tools that integrate seamlessly with business reporting tools, IDE, test automation framework, continuous integration/ continuous delivery (CI/ CD) tools, and the operation/ ticket management system
- Each team member should use one tool throughout the product life cycle
- Tools and frameworks, along with the monitoring dashboard and threshold, need to be updated as part of the development cycle
- Fear of performance defects slipping to production
- It is possible that some performance defects might slip into production. The PerfOps team should use real production data using the same APM tool, find out optimization opportunities, and decide if changes need to be pulled back
- Develop a continuous feedback mechanism from production
Performance engineering activities in DevOps
- Make sure performance requirements (NFRs) are a part of the business requirement or user story. Performance requirements should be treated at par with functional requirement.
- Example, response time, page size, expected load / user traffic, memory usage per user, number of SQLs / Database calls, and number of service calls
- Make sure performance monitoring and reporting requirements are captured
- Consider the current production performance and infrastructure usage in the new feature design. For example, if the new feature is going to use an existing service or DB stored procedures, we should take the current performance KPI in design and SLA into consideration
- Design and implement to avoid duplicate or multiple remote service calls, unnecessary DB calls, inefficient caching strategy, and excessive logging
- Capture the unit test baseline for metrics like number of downstream calls, number of SQL statements executed, and payload size
- Utilize APM tools like Dynatrace that have the means to baseline these performance metrics and capture the deviations
Continuous performance engineering
- Automate performance engineering activities and integrate them into CD/ CI build deployment process using tools like Jenkins
- Integrate performance testing tools like LoadRunner, JMeter, and CloudTest in Jenkins
- Build feature-specific performance test jobs that can be triggered for short durations once the code is deployed in the target environment, in order to validate build's performance
- Automate the recording and analysis of performance metrics for each build. This helps in identifying possible performance issues earlier in development, thereby making the overall testing process more agile
- Automate and maintain APM tool and dashboard setup
- Capture the performance metrics from the load testing tools and APM tools like Dynatrace and AppDynamic. Establish a trend for the capture performance metrics between different builds. This would help identify if a particular version of the deployed code has violated thresholds or degraded in performance
Performance engineering in production
- No matter how hard we try to accurately design our performance testing, it is hard to completely replicate production behavior. The best performance metrics can be captured from the real user data
- Set up a sophisticated monitoring mechanism to capture performance data that can be used to analyze and provide actionable results to the developers
- Instead of going 100 percent live for a new feature, use the concept of A/B testing or multivariate testing to take a staggered approach to go live with a feature. For example, assume there is an existing feature (F) in the production application and a new version of the feature (F') has been developed. Instead of switching all the traffic from F to F' in one shot, use the concept of A/B testing and divert only 1 percent of traffic to the new feature (F') and keep the remaining 99 percent on the existing feature (F). Use the real user data on the new feature (F') to analyze and measure the performance. If the performance of the new feature is satisfactory, then the traffic on F' can be increased to five percent, 10 percent, 25 percent, and so on, until all of the traffic is on F'. If there is a performance issue at any of these points, pull the new feature back, fix the issue, and then start the process again.This whole activity can span across multiple days.
- Load and scalability testing in production is a very common practice. This gives us the ability to run the load on the actual production environment, thereby eliminating all issues with the environment's size and configurations. Right frameworks and stubs are required to carry out load testing in production
Enabling teams with tools and knowledge
- Performance engineering best practices should be advocated to the everyone in the teams
- Performance engineers should participate in code and design reviews
- Teams should be given training in performance engineering tool and demo of performance test/issues analysis
Benefits and key takeaways
• People: Select team members believe in automation. They will automate everything that they do
• Tools and framework: Select a PerfOps solution and not just a performance tool
• Communication: Everyone in the team should know why devOps transition is happening. It's all about bringing efficiency and applying engineering practices in everything we do (not about job insecurity)
• Key benefits of transition from traditional model to PerfOps are increased software quality, reduced cost, shorter release cycle, performant and scalable application, reduced time to detect issues, and reduced time to recover from any production issues. Of course, a happy team and satisfied investors will lead to more and better work coming our way.