Who Should Read This

This paper is for experienced technical architects who work with Salesforce deployments who want to have a better understanding of asynchronous processing. This will help an architect build effective design patterns around asynchronous processing.

Asynchronous Overview

An asynchronous process is a process or function which does not require immediate interaction with a user. An asynchronous process can execute a task "in the background" without the user having to wait for the task to finish. features such as asynchronous Apex, Bulk API, and Reports and Dashboards use asynchronous processing to efficiently process requests.

Asynchronous processing provides a number of benefits including:

  • User Efficiency – By separating functionality that needs an immediate user response from functionality that can be completed at a later time, we can ensure that the Salesforce user experience is always as responsive as possible, and that users are never blocked waiting for a process that could be completed in the background.
  • Resource Efficiency – Each Salesforce instance has a finite set of resources. Under normal load patterns, these resources can be efficiently managed by optimizing latency sensitive jobs, and using asynchronous processing for less latency sensitive jobs.
  • Scalability – By allowing some features of the to execute asynchronously, resources can be managed and scaled quickly. This allows Salesforce instances to handle more customer jobs using parallel processing.

There are many different types of asynchronous requests on Some are user initiated and some are internal housekeeping functions. Asynchronous requests users will be familiar with include:

  • Asynchronous Apex (@future Apex, batch Apex, queueable Apex, scheduled Apex)
  • Bulk API jobs
  • Scheduled Reports
  • Dashboard refreshes asynchronous processing makes a best effort to complete requests as quickly as possible, however there are no guarantees on wait or processing time.

How Salesforce Asynchronous Processing Works

Asynchronous processing, in a multi-tenant environment, presents some challenges:

  • Ensure fairness of processing – Make sure every customer gets a fair chance at processing resources in a multi-tenant architecture.
  • Ensure fault tolerance - Make sure no asynchronous requests are lost due to equiptment or software failures.

The following diagram provides a high level overview of's asynchronous processing technology:

AsynchPaperOverview.png uses a queue-based asynchronous processing framework. This framework is used to manage asynchronous requests for multiple organizations within each instance. The request lifecycle is made up of three parts:

  1. Enqueue – The request gets put into the queue. This could be an Apex batch request, @future Apex request or one of many others. The Salesforce application will enqueue requests along with the appropriate data to process that request.
  2. Persistence – The enqueued request gets persisted. Requests are stored in persistent storage for failure recovery and to provide transactional capabilities.
  3. Dequeue – The enqueued request is removed from the the queue and processed. Transaction management occurs in this step to assure messages are not lost if there is a processing failure.

Each request is processed by a handler. The handler is the code that performs functions for a specific request type.

Handlers are executed by worker threads on each of the application servers that make up an instance. Each application server supports a finite amount of threads. Each thread can execute any type of handler and Salesforce determines how many threads are available to process requests for a given request type. The threads request work from the queuing framework and when received, start a specific handler to do the work. The following diagram shows the asynchronous processing in action:


Notice that the queue contains requests from multiple organizations. Each request can be associated with a job that could vary in complexity and running time. In the diagram, longer running jobs are represented with larger boxes.

As one request is completed, another request is removed from the queue and processed. Error handling and failure recovery is built in (via request persistence) so the requests are not lost if a queue failure or handler failure occurs.

Fair Request Handling

An organization can have many requests outstanding. For example, a single organization could queue 250,000 @future Apex requests in a 24-hour period, depending on Salesforce license type. If one organization adds a large number of requests to the queue, it could prevent other customers from getting access to the worker threads. To avoid this, the queuing framework implements flow control which prevents a single customer from using all of the available threads.

When a worker thread is available to process a request, the queuing framework will determine if the maximum number of worker threads (as determined by the handler) is being used by a single organization. If so, the framework will "peek" into the queue to see if other organizations have requests waiting. The set of requests is called the peek set and is limited to a fixed number of requests at the front of the queue (currently set at 2,000 requests). The framework will look for the requests for a different organization and process those (as long as that organization isn’t currently consuming all of its allocated threads for a given handler).

For example, assume organization 1 creates 13 @future requests that are at the head and adjacent in the queue as shown in the diagram below:


Organization 2 adds two @future requests to the queue:


And two more organization 1 @future requests are en-queued. At this point, the queue looks like this:


For this example, assume that a maximum of 12 threads can process requests from a single organization, and that our peek set size is 15 requests. If 13 total threads are available and no other requests are being processed for organization 1 or organization 2, the processing will be as follows (see diagram above):

  1. 12 threads will take the first 12 requests from organization 1.
  2. The 13th thread will not process a request from organization 1 although it is the next one in the queue. This is because organization 1 has taken its allotted amount of threads. This request will remain in the queue at its current position until one of the 12 threads becomes available. This request is delayed.
  3. The framework will scan for requests from other organizations within the peek set of 15 requests. It will find the first @future request from organization 2 and begin processing this request, skipping the 13th request for organization 1.

What happens when requests for a particular organization occupy the entire peek set when the queue is scanned in step 3 above?

Again, assume 12 threads are processing requests from organization 1. This time, organization 1 has 15 requests remaining in the queue and organization 2 has two requests in the queue as shown in this diagram:


Since all of the requests in the peek set are from a single organization (organization 1), those 15 requests will be moved to the back of the queue with a specific delay. This is called an extended delay.

The delay is different for each message. For example, for @future requests, the delay is 5 minutes. That means a minimum of 5 minutes must elapse before those requests are eligible for processing.

When delayed requests become eligible for processing, it's possible for these requests to be acted upon by flow control and again get moved to the back of the queue and delayed. Therefore requests can be delayed several times before they're completed. Additionally, when those requests are moved, they will be put back into the queue in the same logical order and they may have other requests intermingled with them.

Resource Conservation

Asynchronous processing in is very important but has lower priority over real-time interaction via the browser and API. Message handlers run on the same application servers that process interactive requests, and it's possible that asynchronous processing or increased interactive usage can cause a sudden increase in usage of computing resources.

To ensure there are sufficient resources to handle a sudden increase, the queuing framework will monitor system resources such as server memory and CPU usage and reduce asynchronous processing when thresholds are exceeded. If necessary, under heavy load, Salesforce will delay long running jobs in the queue to give resource priority to synchronous requests. Once the resources fall below thresholds, normal asynchronous processing will continue.

Best Practices

Best Practices for Asynchronous Apex

Apex supports batch Apex and @future Apex methods. Both of these features add requests to the asynchronous queue. Keep the following best practices in mind when planning out development work that will use asynchronous Apex.

Future Apex

Every @future invocation adds one request to the asynchronous queue. Design patterns that would add large numbers of @future requests over a short period of time should be avoided unless absolutely needed. Best practices include:

  • Avoid adding large numbers of @future methods to the asynchronous queue, if possible. If more than 2,000 unprocessed requests from a single organization are in the queue, any additional requests from the same organization will be delayed while the queue handles requests from other organizations.
  • Ensure that the @future requests execute as fast as possible. To ensure fast execution of batch jobs, minimize Web service call out times and tune queries used in your @future methods. The longer the @future method executes, the more likely other queued requests are delayed when there are a large number of requests in the queue.
  • Test your @future methods at scale. Where possible, test using an environment that generates the maximum number of @future methods you’d expect to handle. This will help determine if delays will occur.
  • Consider using batch Apex instead of @future methods to process large number of records asynchronously. This will be more efficient then creating a @future request for each record.

Extended delay time is 5 minutes.

Some example scenarios:

  • A consumer Web page is created using Sites. Each person registering at the site requires three Web service call outs to validate the consumer. These validations need to occur asynchronously after the consumer submits their information. The current design uses one @future call for each Web service call thereby creating large volumes of @future calls. A better design is to create a single @future request for each consumer that will handle the three call outs. This solution creates a much lower volume of requests.
  • For each new lead, outside data validation is required via a Web services call out. The current design creates one @future call for every record, which calls the Web service call out. This creates a large volume of @future requests, each request doing a single call out. A better design pattern is to use bulkification. Create a call out that could accept multiple records and then use @future to process multiple records in one call out. This solution creates a much lower volume of requests.

Batch Apex

Ensure that the Batch Apex process executes efficiently as possible and minimize the batches submitted at one time. Like @future requests, batch Apex needs to execute as fast as possible. Best practices include:

  • Avoid adding large numbers of batch Apex requests to the asynchronous queue, if possible. If more than 2,000 unprocessed requests from a single organization are in the queue, any additional requests from the same organization will be delayed while the queue handles requests from other organizations.
  • Tune any SOQL query to gather the records to execute as quickly as possible.
  • Minimize Web service call out times if utilized.

Extended delay is not applicable to batch Apex.

For more best practices that ensure your asynchronous Apex requests are handled properly, see the Apex Code Developer's Guide.

Best Practices for Bulk API

The Bulk API lets you create jobs and batches to load large volumes of data asynchronously. Bulk API batches are added to the asynchronous queue. If too many batches are submitted at one time, they may be subject to flow control therefore minimize the number of batches if possible.

If more than 2,000 unprocessed requests from a single organization are in the queue, any additional requests from the same organization will be delayed while the queue handles requests from other organizations. Minimize the number of batches submitted at one time to ensure that your batches are not delayed in the queue.

For more best practices that ensure your asynchronous Bulk API batches are handled properly, see the Bulk API Developer's Guide.