Processing Large Amounts of Data with APIs (Part 2 of 2)

When working in an enterprise environment, you may need to process large amounts of Salesforce records using the Platform APIs. This post is the second part of a two-post series focusing on processing data at scale with APIs. In the first part of the series, we focused on read operations with the REST API and the Bulk APIs. In this second part, we will focus on write operations. We’ll compare two APIs that are a great fit for the job: the Composite API and the Bulk APIs.

Write records with the Composite API

While the REST API is great for reading multiple records, its resources are too granular for write operations at scale (one request per write operation). The Composite API overcomes this limitation. It inherits from the REST API in the sense that it’s a synchronous API that has access to resources from the REST API, but it supports multiple write operations per request, thus reducing the overhead of HTTP requests/responses.

There are three main resources in the Composite API that let you write data: composite batch, composite, and composite graph. Before we get into the details of these resources, here’s a general overview of the request structure and limits of these resources:

A diagram that illustrates the Composite API resources

Composite Batch Resource

The composite batch resource is the simplest resource (the least verbose) of the Composite API, but it does come with some limitations. This resource allows the running of up to 25 REST API subrequests sequentially. The key difference between this resource and the others is that batch subrequests are independent, and you can’t pass information between them.

Another important consideration for the batch resource is that subrequests are transactional, but the parent batch request is not. For example, if the third subrequest of a batch fails, only this third subrequest is rolled back (the two previous requests aren’t rolled back). Furthermore, the request will continue to execute the remaining subrequests after there’s been an error unless you set the haltOnError flag to true.

Note: Use the composite batch resource with care, so that you don’t introduce data integrity issues in case of errors.

To execute a composite batch request, run a POST request on INSTANCE_URL/services/data/v56.0/composite/batch with a body like this:

This example renames an account to “New name” and retrieves the updated account name, as well as its billing postal code. The output of this request looks like this:

Response highlights:

The hasErrors flag indicates whether one or more of the subrequests failed.
The results property holds the list of responses for the subrequests. Subrequest responses include two properties: statusCode and results. The value and shape of results depend on the operation that was performed.

Subrequests may include multiple SObjects queries, however, there are important limitations on the results that are retrieved. Once you fetch a total of 2000 records across multiple subrequests, the remaining queries of the request will only fetch the first records of their result sets and provide query locator URLs that let you fetch additional results (see Part 1 of this series for more information).

Note: When retrieving data with the composite batch resource, be mindful of the record retrieval limits.

For example, assuming that your org has more than 2000 records for each SObjects, if you run the following batch request, you would only retrieve up to 1998 accounts, up to two contacts (this gets you to a total of 2000 records), and up to one opportunity.

Composite resource

The composite resource takes the concept of the batch resource one step further with two major improvements. First, instead of executing the 25 subrequests independently, you can now share reference IDs across subrequests. Secondly, you can enforce a transaction at the request level with an allOrNone flag. Enabling the flag lets you roll back the entire request if any of the subrequests fail.

To execute a composite request, run a POST request on INSTANCE_URL/services/data/v56.0/composite with a body like this:

This example contains two subrequests. The first subrequest creates an account named “Sample Account” and the second request attaches a contact to the account using a refAccount reference to pass the parent account ID as @{refAccount.id}. The output of this request looks like this:

Composite graph resource

The composite graph resource builds on the advantages of the other composite resources and extends them with the ability to support multiple graphs instead of a single cascade of subrequests. The parent composite graph request is not transactional as a whole, but each graph is transactional.
The composite graph resource takes operations to another scale with up to 500 subrequests.

To execute a composite graph request, run a POST request on INSTANCE_URL/services/data/v56.0/composite/batch. Here’s an example request body with two graphs (graph1 and graph2) that each contains two subrequests:

This request does the following:

graph1 creates an “ACME Inc.” account and returns the record with all of its fields (some may be auto-filled by formulas or triggers).
graph2 retrieves an account and attaches an opportunity to it. We use a reference to the parent account’s name in the opportunity name: "Opportunity for @{refAccount.Name}".

This request produces this kind of response:

While its responses might be a bit verbose compared to the other composite resources, the graph resource is really the go-to solution for performing write operations on up to 500 records in a synchronous and transactional manner.

If you need to work on more records, like several thousands of records up to millions, then you need to consider asynchronous processing with the Bulk APIs.

Write records with the Bulk APIs

We won’t repeat the content of the first part of this blog series as the structure of the Bulk API requests is very similar for write and read operations. Instead, we will highlight some important guidelines for working with mass data updates at scale.

When working on large amounts of data with Bulk APIs, the processing time is the main constraint, so you want to optimize your operations to avoid timeouts.

Note: As a best practice, simulate large write operations in a sandbox before applying them to production to assess and optimize processing time.

Use compression

Regardless of which Bulk API type you use (original or 2.0), make sure that you enable response compression in order to reduce their size and improve network traffic. All it takes is adding an Accept-Encoding: gzip header to your requests when retrieving result sets.

Minimize fields and object dependencies

A number of factors have a high impact on processing time, including:

The number of objects, fields, and relationships that you’re modifying
The number of automations, such as workflow rules, processes, flows, and triggers that are running on the objects that you’re modifying

Try to minimize those dependencies to reduce processing time. Temporarily disabling certain triggers or automations when ingesting large amounts of data is a good strategy to speed up write operations. Breaking up large batches into smaller batches also helps to reduce processing time.

Avoid overlapping jobs and lock contention

Running multiple jobs and write operations in parallel increases the risk of locks and congestion. Locks happen when a record is being modified concurrently in several operations. Write operations on certain objects, such as users or roles, are more likely to create locks than other objects.

Locks are generally automatically resolved by the Salesforce Platform with a retry mechanism, but this causes some extra delays and may lead to operations timing out when working at scale.

In large orgs, you’ll want to monitor your job queues and schedules, so that you can time your operations accordingly to avoid overlapping jobs.

Closing words

This concludes our tour of the Composite and Bulk APIs for write operations. We covered how the Composite APIs let you chain a number of read/write operations by sharing context between subrequests. You also had an overview of tips and tricks for making the most of the Bulk APIs ingest jobs. You can easily experiment with those APIs thanks to the Salesforce Platform APIs Postman collection.

We’ll leave you with the following table that provides a good summary of the key differences between these APIs for write operations.

	Composite batch resource	Composite resource	Composite graph resource	Bulk API	Bulk API 2.0
Operation type	Non transactional	Optionally transactional	Transactional at graph level	Transactional
Maximum number of operations per request	25	25 including 5 queries or SOject collections	500	From several thousands to millions of records (see documentation for details)
Process type	Synchronous			Asynchronous
Minimum number of request types to perform operation and get results	1			6	3
Supported formats	JSON or XML	JSON	JSON	CSV, JSON or XML + binary (ingest only)	CSV

Resources

Composite Resources in the REST API Developer Guide
Bulk API 2.0 and Bulk API Developer Guide
- Bulk API
- Bulk API 2.0
Salesforce Developer Limits and Allocations Quick Reference
Salesforce Platform APIs Postman collection

About the author

Philippe Ozil is a Principal Developer Advocate at Salesforce where he focuses on the Salesforce Platform. He writes technical content and speaks frequently at conferences. He is a full stack developer and enjoys working on DevOps, robotics, and VR projects. Follow him on Twitter @PhilippeOzil or check his GitHub projects @pozil.