Why a 'guesstimate' while identifying the workload mix may invalidate the complete performance testing findings
Most often it is believed that the main objective of assessing the workload for a system is to generate manageable workload so that the system is neither overloaded nor underloaded. However contrary to what might be thought the focus of accurately modeling the workload is more on loading the system 'appropriately', be it overloaded or underloaded, i.e. load the system in accordance with what is to be expected in production environment.
A typical performance testing exercises would undertake workload modeling as a task to identify the load to be generated on the servers. A graph is plotted, as shown below, which represents the load generated on the web server, an entry point to the application, against time.
What follows is a 'guesstimation' to come up with the load which has to be generated on the servers. Being a peak period, time t' is considered for the determining the workload and an appropriate figure for the load, wload, is calculated. This wload comprises of the total number of page hits the web server has attended to. Further the hits to the different URL's are then taken into consideration. Thus based on this URL analysis, the 'workload mix' is finally defined.
However what is often neglected is the accuracy of this workload mix, i.e. the distribution of various tasks to be sent to the servers. This distribution is of utmost importance as it finally governs how the different servers are going to be loaded.
To illustrate using an example, let's assume the 'wload' mentioned above can be broken into of 3 different sets of URLs which carry out 3 separate tasks, task1, task2, task3 (for example Search, Add to Cart, Checkout). Now it is fairly simple to understand that any task will generate a sequence of events at different servers in any application distributed across tiers. A task1 may be CPU intensive task at Application tier; a task2 may be memory heavy at Application tier or a task3 a Database intensive one.
So if during time t' when the workload was analyzed, if the distribution of the workload among these tasks is not determined accurately, the sole base of the performance testing exercise to be carried out will be deteriorated. So let's say instead of loading more of task1's, task3's are loaded more. This would lead to generating more load on Database tier instead of Application tier. This is for sure not a production like scenario, thus invalidating the complete sets of results generated out of these performance tests.
Hence it is of critical importance to accurately identify this distribution of workload among the mix of the different tasks, typically referred to as transactions. This calls for an analysis of the workload by looking at sets of URLs together, not just individual URLs.