Pragmatic Performance Engineering
Author: Sanjeeb Kumar Jena, Test Engineer
When you hear the term 'performance engineering,' what's the first thing you think of? The famed battle of 'performance testing versus performance engineering' or because we are engineers, do you immediately think of making performance testing a better quality assurance process?
The dictionary says that 'engineering' means 'skillfully arrange for something to occur'; and so, we can safely agree that performance testing (PT) in an agile scrum or sprint is called performance engineering (PE).
However, this is just the tip of the iceberg for performance engineering as a domain. So, here's what I think (or rather, believe) about being a performance engineer:
"Response time, throughput, and resource utilization" are the major key performance indicators (KPIs) for a web system to be scalable and reliable. However, it is not enough neither to just report the information with a visualization nor mathematically calculate the Work Load Model (WLM) for next product release based on these KPIs only.
So what we report with this process is information, it's not knowledge that's directly applicable.
Performance engineering is an end-to-end process -- from design (pre-SDLC), through the entire software development life cycle (SDLC), aimed at ensuring quality management.
Instead of looking at PE as a specific domain, we can think of it as a part of the business model. Business is not just about delivering the end-product to customers, but is a collaboration or organization of people to achieve a predictable goal. As performance engineers, we do business in terms of time and our resources are the budget (as we can only spend what we have) and the goal (customer satisfaction in terms of application performance; that is, speed and reliability).
Thus, PE can't be one single process in just one phase of the Product Development process. Instead, it's a process that starts from the first day and continues forever, to always make things better.
- PE in design phase: In the design phase, the designer or architect makes the interface and the system architecture, while the performance engineer establishes the performance budgets (reflects the input of resources and the output of services for each unit of user interaction on web app) considering the design of the system. Thus, instead of the developer just jumping to a narrow development phase, a collaboration between the designer, performance engineer, and the developer can lay the path to a sustainable and scalable design.
- PE in development: This phase involves the introduction of continuous integration (CI) and version control to the usual 'load testing' scenario. When code is pushed to a version control system, it triggers a build on Jenkins (CI server), which in turn runs a load testing script with the required parameters (synthetic users, duration, etc.) in the development environment. Thus, developers can get continuous real-time reports on performance without worrying about how to use load testing tools, and can also make incremental pragmatic changes to the code starting right from the early phase.
- PE in QA: The QA phase would involve designing a test case scenario and testing it against system at scale (synthetic user load testing) to test the system's readiness in a production-like environment for critical situations.
- PE in application monitoring: Here, intelligent bots / agents can be injected into the 'codebase' or the application environment. These agents collect system performance data and continuously send it to an analysis server. This is the basic principle behind application performance management (APM) tools like Dynatrace, AppDynamics, New Relic, and open source Java virtual machine (JVM) / non-JVM agents. With RESTful integration, the knowledge from these tools can provide a deeper, Continuous code-functionality-level performance analysis based on iterative development of the application.
- PE in operations management: Automation makes the process faster and introduction of extreme automation solutions in application operation management provides insights for optimization in its performance. It is the performance engineer's responsibility to ensure that the gathering and analysis of most of the performance data is fully automated and make this engineered automation available as a service or API for faster integration with the system.
- PE in analysis and reporting: This phase consists of logs that show patterns in the system when it is tested against a simulated real-time user load. These patterns help predict the resource utilization and optimize the resource utilization model. Using log analysis tools like Splunk, ELK (Elastic search-Logstash-Kibana), or a basic UNIX / PowerShell utility, trends or patterns can be obtained from historical data and presented as a continuous trending visualization. With rapid growth in digitalization, the size of the user-generated data is growing and becoming real-time.
Real-time performance analysis and prediction can be performed by introducing machine intelligence (machine learning or deep learning models) in order to derive accurate knowledge.
This way, the Work Load Model(WLM) for next feature release can be more accurate as it captures the behavioral trend from performance data rather calculating from a statistical formulae.
All of the processes mentioned above can be combined into a single framework (not just a tool) to engineer performance in application development to build better performance for end-user engagement on the application, so it makes better consumer/people-driven business.
The real question however is: What do we need to do to make this pragmatic PE framework a reality in our team?
To me, if the performance engineer's target is to provide the best end user experience -- in terms of application performance -- then, from a design-thinking or human-centered design perspective, a cultural shift is required.
I'd say- Think of PE as a culture or a mindset, rather than a set of principles or tools.
In order to bring innovation in PE, we need empathy and collaboration. Almost everything in business has become data-driven. There's a thinking...If you can't measure it, you can't manage it. Data certainly provides valuable insight, but it doesn't tell the whole story. Collaboration among every stakeholder creates greater insights from these performance data and makes greater value for overall system. When every person in the organization understand the feelings of the end-users, they build better systems utilizing performance data and targeted user-behavior. Empathy towards end-user plays a big role in performance engineering.
In fast changing world, speed matters; and to get speed, every part of the system must work in symphony.
Think of a situation where:
- A performance test is conducted by simulating a heavy load scenario
- PT tools and APM provide a detailed analysis
- Developers immediately receive the performance feedback and make iterations as required
- With each change in the code, an automatic performance suite with a unit test is triggered on CI/CD server like Jenkins
- Visualization dashboards from the real-steaming data shows how real-users/synthetic users in test environment using the application, so business owners can make data-driven decisions on features without a long-debate in the war-room.
- If the analyzed performance metrics meet the available software budget, the code moves into production.
- If not, the entire process goes through another iteration.
In this automated performance engineering framework, performance at scale can be analyzed and achieved in a faster and pragmatic (practical) manner.
In the next blog, we will discuss 'performance as a culture'; building a platform or test environment on cloud (AWS/Azure) to rapidly test solutions to find the best available solution at the moment, then integrate it or improve on top of it.