Load Test on the Salesforce Platform Using Scalability Products

Scalability is a key concern in enterprise application development, and it’s critical for developers to carefully prepare and test their apps to handle the maximum load expected. For example, if you’re using Agentforce for Developers to build AI applications on Salesforce, you’ll want to make sure that your actions using flows, Apex, and prompt templates are fully performant, and that your code is scalable and optimized for multi-tenant environments. With the help of tools like Scale Center and Scale Test, along with some key best practices, you can prepare your Salesforce applications for seamless, peak-season readiness and successful launches on the Salesforce Platform.

In this blog post, we’ll walk you through a high-level sequence of steps that will help you plan, prepare, conduct, and analyze a successful load test using Salesforce’s Scale Center and Scale Test. The seven steps covered here are:

Prepare your test script and environment
Define your testing goals
Define a concurrency model and throughput requirements
Define scalability criteria and determine the accuracy of your scale test before your full test
Determine your ramp plan
Prepare your test sub-systems
Evaluate your test with Scale Center

Ok, let’s get started!

Step 1: Prepare your test script and environment

If you’re a Salesforce customer who’s onboarded to Scale Test and you want to book your load test slot on certain days, you must plan for your test before the allotted slot date. This planning not only helps you to effectively make use of your test slot, but it also helps you have a seamless experience with Scale Test on the Salesforce Platform.

An important step is to refresh your Full Copy sandbox from your Production org to ensure complete metadata and data sync. This prevents false positives in the test results due to data size or codebase mismatches, thus safeguarding production stability during scale tests.

There are two aspects of load testing:

Development of the test script: There are two main ways to develop test scripts: through an API-based approach using Jmeter, or through browser-based UI tests. Your choice of tool depends on your application and business need. For example, if your focus is on the performance of Lightning pages, including the load time of different components, it’s recommended to use UI-based script recorder tools like Selenium. If your goal is to only test the scale of the app, then you can also use API-based test scripts.
Environment setup: You’ll need to set up an environment where these scripts can be executed using the tool of your choice. A pilot feature in Scale Test can provide an environment to run the scripts.

Once your 100-user test is complete, you can then move forward with booking your load test slot.

To book a slot in Scale Test, navigate to the Test Scheduler page from the Set up menu and select the start date and end date for your slot. This slot will allow Scale Test to upgrade your sandbox infrastructure, thereby providing more compute capacity to support a full-blown load test.

You’ll want to perform a full load test and then analyze the results using Scale Center. During the course of the 100-user test, it is a general best practice to leverage Scale Center to identify performance hotspots. We’ll dive into this later in the post.

Step 2: Define your testing goals

Next, you’ll want to get approval from your business stakeholders on the testing use case and detailed business goals for the test. Pick a repeatable and business critical use case for conducting your scalability tests.

Let’s work this out using an example. Say that we have a sales application that needs to able to process 100,000 opportunities per hour across all five stages of the sales cycle: Prospecting, Qualification, Review, Negotiation, and Conversion.

First, we’ll use Scale Test’s Test Plan Creation feature to come up with the right set of use cases for our application, including the various tests and metrics needed. These data points will come from our peak business hour traffic, which the platform determines intelligently by evaluating the past 30 days, week-over-week, when maximum throughput was generated. We can then use this data to create a test plan.

To access this peak hour data, we’ll first navigate to the Test Plan Creation page in Scale Test, and select the first tab: Select Peak Hour.

A screenshot of the Test Plan Creation page showing different peak hours for a sample org

Now, once we’ve determined the peak hour data, we can identify throughput aspects by generating a report using any of the top four peak metrics. Once available, this asynchronous report has two sections.

Section 1: Test use cases for the application
In our example, the Identify Test Use Cases tab in our throughput report consists of the following use cases:

Most Used Lightning Pages of Opportunity Lifecycle and Its EPT
Most Used Lightning Components and Their Load Time
Most Used REST API Calls in the Lifecycle
Most Used SOAP API Calls in the lifecycle

A screenshot showing the output from a sample org’s Lighting pages and Lightning components

Section 2: Server-side metrics of the application
This could be any specific set of scenarios and/or the entire lifecycle of the business processes implemented on the Salesforce Platform. Using the set of data points below, we can evaluate various sub-components that form the overall business process execution lifecycle. All of these cumulatively form a complete test suite to verify the required scale.

A screenshot showing output from a sample org’s server-side metrics

Step 3: Define a concurrency model and throughput requirements

Throughput in this context means the number of transactions that take place in a given interval of time per user. In Salesforce, these transactions could be saves or loads of particular entities, or an interaction between a user (or an API) and Salesforce. This data can either be derived by looking at Scale Center’s Request Volume Chart, or it can also be identified from the Test Plan Creation section, where exact request-per-second (RPS) data can be found.

A chart showing the output of the Org Performance page from a sample org in Scale Center

Users can look at the Requests/10 mins graph at its maximum, and then divide it by 600 to convert it to requests per second.

The concurrency model shows how many simultaneous users will drive the overall load on the system. It’s important to correctly measure both of these parameters on a smaller load of five to 10 users. Below is a sequential diagram showing how to evaluate and get both parameters right. Follow this flowchart to ensure complete accuracy in determining the right throughput and concurrency requirements.

A flowchart showing the steps to determining the right concurrency and throughput for any test

Step 4: Define scalability criteria and determine the accuracy of your scale test before your full test

To define scalability criteria based on business requirements for a peak season or peak hour, you’ll first need to determine the current throughput that your application generates in a Salesforce org. You can then use this as a baseline for the scaled workload.

Consider, for example, an org that normally averages 5,000 cases per hour per application, but during peak season, it is expected to average 50,000 cases per hour per application. In this case, you’ll need to test with 10x the normal throughput. Thus, the success criteria of scale test is that its able to generate 10X level of throughput without any errors or degradation in response time.

Determining the test’s accuracy is a must-have before going into the full blown mode of a scale test. You can use Scale Test’s Trial Accuracy feature to determine how close the scale test is to your production workload. This helps in reducing any false positives that your test could surface if the test is not accurate enough. This feature is also only helpful when the metadata is in sync with the Full Copy sandbox from where your tests are being executed.

The following screenshot shows how the accuracy of a recently concluded test has been determined using the Trial Accuracy Checker in Scale Test.

A screenshot showing output from a sample org’s Trial Accuracy Checker with required business metrics from production and sandbox

Let’s use this feature in a real-time situation using our example application. Suppose that we’re creating a test suite that’s intended to run 100,000 opportunities per hour, so this 100,000 number becomes the business metric for which we need to test.

Say that during our regular business peak hours, our org is executing 25,000 opportunities per hour. We can feed this number into the Production Baseline Performance field and 100,000 in the Sandbox Baseline Performance field, as shown in the screenshot above. By using this single production metric, the tool will automatically calculate our test accuracy measurements.

Please note that test accuracy is divided into four sections.

UI Accuracy: This indicates how closely aligned the total number of XHRs executed by the test were with what’s being executed in production for the same throughput.
API Accuracy: This measures the accuracy of the APIs that need to be present in the test compared to the actual production load.
DB Read Accuracy: This refers to the total database reads that test is performing, including SOQLs that are being executed in the test and the percentage of accuracy compared with production numbers.
DB Write Accuracy: This refers to the total database writes that the test is performing, including all Insert, Update, and Delete operations that are being executed during this test time. It also indicates database write accuracy compared to production for the same business numbers.

The above screenshot shows the respective values for each of the trials conducted. The color coding of the columns indicates whether the accuracy is good enough to proceed with the actual full blown load test.

Step 5: Determine your ramp plan

Gradually increasing the number of users is key to avoiding unexpected problems. First, you’ll want to determine the number of virtual users you’ll need to execute the full workload. You’ll also need to figure out how many concurrent users (or threads) are required to achieve the scaled business throughput.

You can calculate the number of required concurrent users to generate a specific transaction rate using a mathematical relationship from queuing theory called Little’s Law. The relationship can be defined in Salesforce terms as follows:
Total number of users = Total time spent per transaction * transactions per second.

When designing scale tests, the total time for a transaction is equal to the sum of the aggregate response time, think time, and wait time. Therefore, the total number of users required is given by: Total number of users = (aggregate response time + think time + wait time) * transactions per second.

To get the number of users you need, divide the final business throughput number by the single-user execution timelines. For example, let’s say you have a case management application with 10 agents who each close 10 cases in an hour, for a team total of 100 cases per hour. If you want to scale test the application to process 10,000 cases per hour, you’ll need 1,000 users. This will mimic the real-time production volume.

Once your user base is defined, you can create your ramp-up plan. For example, if you have a 1,000 user base, you would need five minutes of proper ramp-up time to activate all of these users in the system.

Step 6: Prepare your test sub-systems

Once you have defined your test plan, which consists of all the important business processes, required throughput and scale, and the user base, it’s important to prepare your test sub-systems to be ready to handle that load.

Here are some important steps to prepare your middleware and other sub-systems:

Identify your key integration touchpoints in the application. This might seem obvious, but in a complex enterprise application where there could be hundreds of integrations, it’s important to have a map of all the must-have integrations that ensure that the application moves forward based on the API response. An important consideration: application navigation to the next steps should be stopped if there is no response from the API layer. As we identified in the Test Plan Creation step, which provided the required REST API, SOAP API, platform events, and workflow-related details, this should be a good starting point.
Mock your APIs with the appropriate stubbed response. Once the critical list of integrations is identified, mocking should be done using an external API source system, which is being invoked by Salesforce. The mocking may be a tedious process as it needs evaluation of the payload and different data structures involved. But it’s important to perform this exercise well in advance of booking your test slots.

The screenshot below shows a list of most-used REST APIs and SOAP APIs that are invoked from the platform, and which need to be mimicked in the test suite.

A screenshot showing output from a sample org’s most-used REST and SOAP APIs that are required for the scale test, which is a result of the Peak Hour Throughput analysis of production

To speed up the process of mocking, here are some quick tips:

Collect the actual response payload from production by navigating to the respective API source.
Evaluate the parameters in the payload, which can be mimicked by a mocked set of responses. An example could be user details including name or address. All these values potentially don’t need to be actuated, and instead can be mocked by random values as applicable.
If multiple relaying sub-systems are involved in API integration, prepare the stubbed response from the top-most layer, which talks to Salesforce directly. For example, there could be an integration making a callout from the Salesforce Platform to a MuleSoft system, which then makes further requests to a third party, thereby bringing the final data that is then passed from MuleSoft to Salesforce. In this chain, it’s best to have a mock-up service deployed at the Mule layer directly with the appropriate data structure, so that multiple components aren’t invoked sequentially and thus add to the delay in creating this on a lower environment.
Ensure parity in the infrastructure provided to the middleware that is mocked. At times, enterprise customers have different gateways/proxies on which they deploy the production and sandbox integration layers. In those cases, just like how Salesforce is increasing its capacity to mimic production-grade infrastructure, it’s possible that infrastructure allotted to the sandbox setup of the integration layer may not be enough. This is critical, so that on the actual testing days, users do not encounter errors due to these scenarios.

A high-level flow chart describing how to set up integration systems for load testing

Step 7: Evaluate your test with Scale Center

All tests conducted must be evaluated for further appropriate actions. You’ll want to leverage Scale Center for doing such analysis. Let’s walk this through by using our running example of conducting a test of 100,000 opportunities per hour.

Once our test concludes, we’ll go to Scale Center and navigate to the Org Performance page with the time duration of the just-concluded test. We’ll need to analyze all the tests that have been executed, using Scale Center to identify bottlenecks and action items for improving the overall scale of the application.

First, we’ll enter the same time window as mentioned in the test time on the Org Performance page, and click Submit.

A screenshot showing the high-level flow on the Org Performance page per the test’s time frame

The first thing that we can see is that there are rowlock errors that have occurred during the test, and zero concurrent Apex, connection pool, and concurrent UI errors. This usually means that from the Salesforce server side, things were fine since requests did not run into server-side errors.

Then, for the rowlocks, we can see a straight line for the first two hours and then a dip, which means for the workload that was being executed, all the time errors were consistent, so nothing alarmingly wrong here. The rowlocks count shows 14K for four hours, or 3,500 rowlocks/hour.

In our example, the application was creating roughly an average of 100,000 opportunities per hour. Our scale test was effectively getting only an average of 3,500 rowlocks, which is not even one rowlock per opportunity. So, this should clarify whether or not there was an abnormally high number of rowlocks during the test.

Next, the Request Volume Chart clearly shows that there was a linear trend and no abnormal drop in Salesforce server requests, which means both the systems were performing efficiently.

Then, by selecting the first 30 minutes of steady state within the testing window, our performance teams who are running the test can analyze these tests by creating a Consolidated Report in Scale Center to look into different aspects of the application.

For the purpose of analyzing a recently concluded test, we choose a time frame when the maximun number of rowlocks were observed during our test. Once this Consolidated Report is generated, we can navigate to a tab named Rowlock Errors. When we click on that, it will reflect all the details of the rowlocks observed by the user.

The screenshot below shows the Rowlocks Investigation Report tab in our Consolidated Report.

A screenshot showing a sample org’s Rowlock Investigation Report

During scale tests, keep in mind that rowlocks can occur for only one or two records. If this happens, you should check the script to make sure that the correlation and parametrization are accurate. It’s important to rule out the possibility that a single record ID is being used throughout the script as this can also cause rowlock errors.

Let’s look into some of other aspects of the same Consolidated Report that will help you identify potential hotspots in your application. We recommend that you first check the Apex Summary Report.

The screenshot below shows the Apex Summary Report tab in our Consolidated Report.

A screenshot showing a sample Org Apex Summary Report with test’s time frame

Our example Apex Summary Report shows that for all Apex transactions, the most time spent is on database time. The report also lists the entry point in critical major or minor patterns which, according to our algorithm, dominates the entry points that are driving database time. In this case, its Solution_Trigger needs to be evaluated.

The same report also provides factors contributing to database time, as shown in the screenshot below.

A screenshot showing a sample Apex Summary Report for the test’s time frame

The screenshot above mentions all the Apex errors that have occurred during this time window. In this case, Apex is experiencing errors when executing the CompUpdateGenericJob class.

In this report, we also see factors that contribute to Database Time, including the top five SOQLs with the highest query execution time, line number, and method name of invocation, along with the top five Expensive DML. Important data within each row is highlighted in red. If there is any tuning scope, these are the ones that you can evaluate and try to fix.

Please note: these are the same SOQLs that internal teams also get on the back-end reports when tuning application-related SOQLs.

Similarly, if you face concurrent Apex errors, you can look into the Concurrent Errors Analysis, an extension of the Apex Summary Report, which has many more details that can affect concurrency. This analysis also looks into aspects that run for >5s and contribute to concurrency and improve the overall scale of the application by reducing the highlighted entry points’ runtime.

Conclusion

Following the in-depth approach to preparing for your load test as outlined in this post will ensure the efficient utilization of your allocated Scale Test slot. By planning meticulously, analyzing results effectively, and optimizing your applications accordingly, you can ensure that your Salesforce applications scale seamlessly on the Salesforce Platform to meet peak demands.

Resources

Blog post: Your App Shouldn’t Panic in Rush Hour Traffic – Here’s How to Prepare
Blog post: Analyze Performance & Scale Hotspots in Complex Salesforce Apps
Blog post: How to Scale Test on Salesforce

About the author

Anand Vardhan is a Product Owner for the Scalability Products Team, helping customers develop scalable solutions on the Customer 360 platform. Anand designs features for Scale Center and Scale Test products. His background includes performance and scale engineering, server-side optimization, Lightning, application design, caching, and managing large data volumes. Follow Anand on LinkedIn.