Slim Down with the New Bulk API v2

Learn about how the new Bulk API v2 makes developing data integration apps easy. Also, discover some of the new features of Bulk API v2.

Working with Bulk API and finding that your code is a little bulky itself? The new Bulk API v2 might be just the diet you’re looking for.

Now generally available in Winter ’18 (API version 41.0), Bulk API v2 brings the power of bulk transactions from Bulk API v1 into a simplified, easier-to-use API. Bulk API v2 lets you create, update, or delete millions of records asynchronously, just like Bulk API v1, but offers the following core improvements:

  • Bulk API v2 uses the same REST API framework as other Salesforce REST APIs. You can use OAuth authentication just like any other Salesforce REST API and take advantage of features like CORS (cross-origin resource sharing) support.
  • Bulk API v2 does away with the need to manually break up data into batches. Simply submit jobs with the full set of records, and Salesforce automatically determines the most efficient way to batch the data.
  • Bulk API v2 simplifies the basic daily limits. Instead of having limits based on the number of Bulk jobs and batches, you’re simply limited to a maximum number of records (100 million) per 24 hour period.

Bulk API v2 also has a number of new features that aren’t available in Bulk API v1, but more on that later.

Comparing Bulk API v1 and v2

First, to get a sense of how Bulk API v2 is going to make things easier for you, let’s compare the process of creating a set of records using Bulk API v1 versus Bulk API v2.

With Bulk API v1, you’d need to write code to do the following steps:

  1. Get an authenticated session ID, likely via a completely different API, such as the SOAP API.
  2. Create a Bulk API v1 job.
  3. Break up job data into batches. This can be a complex task and in many scenarios will involve:
    1. Break up data to fit within the Bulk API v1 batch size limit for records, and batch size limit for total size for a batch.
    2. Decide if you need to use special processing headers, like compression or PK Chunking.
    3. Analyze data chunks for potential locking issues due to data skew, which could result in very slow or failed bulk processing. Re-organize batches as needed.
    4. Minimize the amount of data post-processing actions (like triggers and Workflow rules) that might result in batch processing timeouts.
    5. After all of this, if it turns out your batches take too long to process, go through the process of determining the best way to organize your batches all over again.
  4. Upload data in batches.
  5. Verify the batches uploaded properly.
  6. Close the job, which tells Salesforce to start processing the records.
  7. Check the status of the job.
  8. If the job completes with no errors, we’re done.
  9. If the job completes but encountered errors during processing, iterate over each batch in the job, collect results of successful and failed records for the batch, determine why the records failed, and re-assemble the data to submit a new job as needed.

So how does the same process look in Bulk API v2?

  1. Authenticate using OAuth.
  2. Create a Bulk API v2 job.
  3. Upload all your data.
  4. Close the job, which tells Salesforce to start processing the data.
  5. Check the status of the job.
  6. If the job completes with no errors, we’re done.
  7. If the job completes but encountered errors during processing, request the complete list of failed records with one API call, determine why the records failed, and submit a new job as needed.

Notice how much easier and more convenient the Bulk API v2 process is. Everything from initial authentication to getting failed records is easier. You no longer have to write a lot of extra error-prone code to handle tasks like assembling a list of failed records from a set of failed batches. Also notice how the basic overall process of creating and submitting asynchronous jobs is still the same — you don’t have to learn and code a whole new paradigm to upload your bulk data.

Quick Walkthrough of Bulk API v2

The whole process for Bulk API v2 is so simple that we can walk through the actual steps in detail in this blog post. Let’s walk through creating some Contact records.

Authenticate using OAuth

First, you need to authenticate. Using your preferred choice of OAuth flow, issue requests to https://login.salesforce.com/services/oauth2/authorize to obtain an authentication token.

Create a job

Next, create a new Bulk API v2 job by issuing a POST request to /services/data/v41.0/jobs/ingest/ with the following request body:

{
  "object" : "Contact",
  "contentType" : "CSV",
  "operation" : "insert"
}

This creates a job that will insert new Contact records.

Upload the data

Issue a PUT request using the JOB ID returned from the previous request to the following URI:

/services/data/v41.0/jobs/ingest/JOB ID/batches/

The request body will be CSV data of all the records you want to upload (with the Content-Type request header set to text/csv).

Close the job

Issue a PATCH request again using the JOB ID, to the following URI:

/services/data/v41.0/jobs/ingest/JOB ID/

With the following request body:

{
  "state" : "UploadComplete"
}

This tells Salesforce we’re done uploading data for the job, and Salesforce will start inserting the records.

Check the status of the job

Issue a GET request to:

/services/data/v41.0/jobs/ingest/JOB ID/

Look for a job state of JobComplete to know Salesforce is done processing the job.

Get errors for any failed records

If the job status indicates that some records encountered errors during processing, issue a GET request to the following URI to get a full list of the failed records:

/services/data/v41.0/jobs/ingest/JOB ID/failedResults/

And that’s it, really.

Additional v2 Features

Bulk API v2 goes beyond Bulk API v1 and offers some additional features to make your life easier. These include:

  • When creating a new job, you can also include the job data in the same request, using a multi-part request. This is limited to smaller sets of records (up to 20K characters).
  • You can specify different column delimiters and line endings for your CSV data, including:
    • backquotes, carets, pipes, semi-colons, and tabs for delimiters (instead of commas)
    • carriage-return & linefeed line endings (instead of just linefeeds)
  • You can get a list of all Bulk API jobs in your org (active and completed) and use query parameters to filter this list. For example a GET request to /services/data/vXX.X/jobs/ingest?concurrencyMode=parallel will return a list of all jobs in your org using parallel concurrency mode for processing.

Note that you can’t do Bulk queries in Bulk API v2 yet.

Bulk API v2 reduces the amount of code you have to write and gives you more options on how to process your data. Plus, it simplifies data limits, so you can spend less time worrying  about how much data you can work with, and spend more time actually running your integration jobs. Consider taking the time to switch over to using Bulk API v2 if you’re using v1, and your code will be slim and trim in no time!

Further resources

For more information on Bulk API v2, see:

Bulk API 2.0 Developer Guide
Bulk API unit of the API Basics Trailhead module

Leave your comments...

Slim Down with the New Bulk API v2