Walk Through a Bulk Query Sample Using PK Chunking

This code sample uses cURL to perform a bulk query with PK chunking enabled on several account records.

Before you begin building an integration or other client application:

  • Install your development platform according to its product documentation.
  • Read through all the steps before creating the test client application. Also review the rest of this document to familiarize yourself with terms and concepts.

Note

Create a Job with PK Chunking Enabled

  1. Create a file called create-job.xml containing this text.
    <?xml version="1.0" encoding="UTF-8"?>
    <jobInfo
        xmlns="http://www.force.com/2009/06/asyncapi/dataload">
      <operation>query</operation>
      <object>Account</object>
      <concurrencyMode>Parallel</concurrencyMode>
      <contentType>CSV</contentType>
    </jobInfo>
  2. Using a command-line window, execute this cURL command to create a job with PK chunking enabled.

    curl -H "X-SFDC-Session: sessionId" -H "Content-Type: application/xml; charset=UTF-8" -H "Sforce-Enable-PKChunking: true" -d @create-job.xml https://instance.salesforce.com/services/async/64.0/job

    instance is the portion of the <serverUrl> element, and sessionId is the <sessionId> element that you noted in the login response.

    Salesforce recommends that you enable PK chunking when querying tables with more than 10 million records or when a bulk query consistently times out. For the purposes of this example, if you’re querying significantly fewer records, set chunkSize to a number smaller than the number of records you’re querying. For example, Sforce-Enable-PKChunking: chunkSize=1000. This way, you get to see PK chunking in action, and the query is split into multiple batches.

    Note

    Salesforce returns an XML response with data such as this.

    <?xml version="1.0" encoding="UTF-8"?>
    <jobInfo
        xmlns="http://www.force.com/2009/06/asyncapi/dataload">
      <id>750x000000009tvAAA</id>
      <operation>query</operation>
      <object>Account</object>
      <createdById>005x0000001WR0lAAG</createdById>
      <createdDate>2016-01-10T00:53:19.000Z</createdDate>
      <systemModstamp>2016-01-10T00:53:19.000Z</systemModstamp>
      <state>Open</state>
      <concurrencyMode>Parallel</concurrencyMode>
      <contentType>CSV</contentType>
      <numberBatchesQueued>0</numberBatchesQueued>
      <numberBatchesInProgress>0</numberBatchesInProgress>
      <numberBatchesCompleted>0</numberBatchesCompleted>
      <numberBatchesFailed>0</numberBatchesFailed>
      <numberBatchesTotal>0</numberBatchesTotal>
      <numberRecordsProcessed>0</numberRecordsProcessed>
      <numberRetries>0</numberRetries>
      <apiVersion>36.0</apiVersion>
      <numberRecordsFailed>0</numberRecordsFailed>
      <totalProcessingTime>0</totalProcessingTime>
      <apiActiveProcessingTime>0</apiActiveProcessingTime>
      <apexProcessingTime>0</apexProcessingTime>
    </jobInfo>

Add a Batch to the Job

  1. Create a file called query.txt to contain the SOQL query statement.
    SELECT Id, Name FROM Account
  2. Using a command-line window, execute this cURL command to add a batch to the job.

    curl -d @query.txt -H "X-SFDC-Session: sessionId" -H "Content-Type: text/csv; charset=UTF-8" https://instance.salesforce.com/services/async/64.0/job/jobId/batch

    jobId is the job ID returned in the response to the job creation.

    Salesforce returns an XML response with data such as this.

    <?xml version="1.0" encoding="UTF-8"?>
    <batchInfo
        xmlns="http://www.force.com/2009/06/asyncapi/dataload">
      <id>751x000000009vwAAA</id>
      <jobId>750x000000009tvAAA</jobId>
      <state>Queued</state>
      <createdDate>2016-01-10T00:59:47.000Z</createdDate>
      <systemModstamp>2016-01-10T00:59:47.000Z</systemModstamp>
      <numberRecordsProcessed>0</numberRecordsProcessed>
      <numberRecordsFailed>0</numberRecordsFailed>
      <totalProcessingTime>0</totalProcessingTime>
      <apiActiveProcessingTime>0</apiActiveProcessingTime>
      <apexProcessingTime>0</apexProcessingTime>
    </batchInfo>

    When you add a batch to a bulk query job, the Content-Type in the header for the request must be text/csv, application/xml, or application/json, depending on the content type specified when the job was created. The actual SOQL statement supplied for the batch is in plain text format.

    Note

Check the Status of the Job and Batch

  1. Using a command-line window, execute this cURL command to check the job status.

    curl -H "X-SFDC-Session: sessionId" https://instance.salesforce.com/services/async/64.0/job/jobId

    Salesforce returns an XML response with data such as this.

    <?xml version="1.0" encoding="UTF-8"?>
    <jobInfo
        xmlns="http://www.force.com/2009/06/asyncapi/dataload">
      <id>750x000000009tvAAA</id>
      <operation>query</operation>
      <object>Account</object>
      <createdById>005x0000001WR0lAAG</createdById>
      <createdDate>2016-01-10T00:53:19.000Z</createdDate>
      <systemModstamp>2016-01-10T00:53:19.000Z</systemModstamp>
      <state>Open</state>
      <concurrencyMode>Parallel</concurrencyMode>
      <contentType>CSV</contentType>
      <numberBatchesQueued>0</numberBatchesQueued>
      <numberBatchesInProgress>0</numberBatchesInProgress>
      <numberBatchesCompleted>4</numberBatchesCompleted>
      <numberBatchesFailed>0</numberBatchesFailed>
      <numberBatchesTotal>4</numberBatchesTotal>
      <numberRecordsProcessed>350000</numberRecordsProcessed>
      <numberRetries>0</numberRetries>
      <apiVersion>36.0</apiVersion>
      <numberRecordsFailed>0</numberRecordsFailed>
      <totalProcessingTime>3500</totalProcessingTime>
      <apiActiveProcessingTime>3500</apiActiveProcessingTime>
      <apexProcessingTime>0</apexProcessingTime>
    </jobInfo>
    Because PK chunking is enabled, extra batches are automatically created to process the entire query.
  2. Using a command-line window, execute this cURL command to check the status of the original batch.

    curl -H "X-SFDC-Session: sessionId" https://instance.salesforce.com/services/async/64.0/job/jobId/batch/batchId

    batchId is the batch ID in the response to the batch creation.

    Salesforce returns an XML response with data such as this.

    <?xml version="1.0" encoding="UTF-8"?>
    <batchInfo
        xmlns="http://www.force.com/2009/06/asyncapi/dataload">
      <id>751x000000009vwAAA</id>
      <jobId>750x000000009tvAAA</jobId>
      <state>Not Processed</state>
      <createdDate>2016-01-10T00:59:47.000Z</createdDate>
      <systemModstamp>2016-01-10T01:00:19.000Z</systemModstamp>
      <numberRecordsProcessed>0</numberRecordsProcessed>
      <numberRecordsFailed>0</numberRecordsFailed>
      <totalProcessingTime>0</totalProcessingTime>
      <apiActiveProcessingTime>0</apiActiveProcessingTime>
      <apexProcessingTime>0</apexProcessingTime>
    </batchInfo>
    Because PK chunking is enabled, the original batch is given a state of Not Processed. The query is processed in the remaining batches.

Get the IDs of the Remaining Batches

Using the command-line window, execute this cURL command to retrieve the remaining batches.

curl -H "X-SFDC-Session: sessionId" https://instance.salesforce.com/services/async/64.0/job/jobId/batch

Salesforce returns an XML response with data such as this.

<?xml version="1.0" encoding="UTF-8"?><batchInfoList
   xmlns="http://www.force.com/2009/06/asyncapi/dataload">
 <batchInfo>
  <id>751D00000004YjwIAE</id>
  <jobId>750D00000004T5OIAU</jobId>
  <state>NotProcessed</state>
  <createdDate>2016-01-10T00:59:47.000Z</createdDate>
  <systemModstamp>2016-01-10T01:00:19.000Z</systemModstamp>
  <numberRecordsProcessed>0</numberRecordsProcessed>
  <numberRecordsFailed>0</numberRecordsFailed>
  <totalProcessingTime>0</totalProcessingTime>
  <apiActiveProcessingTime>0</apiActiveProcessingTime>
  <apexProcessingTime>0</apexProcessingTime>
 </batchInfo>
 <batchInfo>
  <id>751D00000004Yk1IAE</id>
  <jobId>750D00000004T5OIAU</jobId>
  <state>Completed</state>
  <createdDate>2016-01-10T00:59:47.000Z</createdDate>
  <systemModstamp>2016-01-10T01:00:19.000Z</systemModstamp>
  <numberRecordsProcessed>100000</numberRecordsProcessed>
  <numberRecordsFailed>0</numberRecordsFailed>
  <totalProcessingTime>1000</totalProcessingTime>
  <apiActiveProcessingTime>1000</apiActiveProcessingTime>
  <apexProcessingTime>0</apexProcessingTime>
 </batchInfo>
 <batchInfo>
  <id>751D00000004Yk2IAE</id>
  <jobId>750D00000004T5OIAU</jobId>
  <state>Completed</state>
  <createdDate>2016-01-10T00:59:47.000Z</createdDate>
  <systemModstamp>2016-01-10T01:00:19.000Z</systemModstamp>
  <numberRecordsProcessed>100000</numberRecordsProcessed>
  <numberRecordsFailed>0</numberRecordsFailed>
  <totalProcessingTime>1000</totalProcessingTime>
  <apiActiveProcessingTime>1000</apiActiveProcessingTime>
  <apexProcessingTime>0</apexProcessingTime>
 </batchInfo>
 <batchInfo>
  <id>751D00000004Yk6IAE</id>
  <jobId>750D00000004T5OIAU</jobId>
  <state>Completed</state>
  <createdDate>2016-01-10T00:59:47.000Z</createdDate>
  <systemModstamp>2016-01-10T01:00:19.000Z</systemModstamp>
  <numberRecordsProcessed>100000</numberRecordsProcessed>
  <numberRecordsFailed>0</numberRecordsFailed>
  <totalProcessingTime>1000</totalProcessingTime>
  <apiActiveProcessingTime>1000</apiActiveProcessingTime>
  <apexProcessingTime>0</apexProcessingTime>
 </batchInfo>
 <batchInfo>
  <id>751D00000004Yk7IAE</id>
  <jobId>750D00000004T5OIAU</jobId>
  <state>Completed</state>
  <createdDate>2016-01-10T00:59:47.000Z</createdDate>
  <systemModstamp>2016-01-10T01:00:19.000Z</systemModstamp>
  <numberRecordsProcessed>50000</numberRecordsProcessed>
  <numberRecordsFailed>0</numberRecordsFailed>
  <totalProcessingTime>500</totalProcessingTime>
  <apiActiveProcessingTime>500</apiActiveProcessingTime>
  <apexProcessingTime>0</apexProcessingTime>
 </batchInfo>
</batchInfoList>

Retrieve the Results

Perform these steps for each remaining batch.
  1. Using the command-line window, execute this cURL command to retrieve the batch result list.

    curl -H "X-SFDC-Session: sessionId" https://instance.salesforce.com/services/async/64.0/job/jobId/batch/batchId/result

    Salesforce returns an XML response with data such as this.

    <?xml version="1.0" encoding="UTF-8"?>
    <result-list xmlns="http://www.force.com/2009/06/asyncapi/dataload">
      <result>752x00000004CJE</result>
    </result-list>

    If the batch required retries, there will be more than one <result> element in the output.

    Note

  2. Using the command-line window, execute this cURL command to retrieve the results of the query.

    curl -H "X-SFDC-Session: sessionId" https://instance.salesforce.com/services/async/64.0/job/jobId/batch/batchId/result/resultId

    resultId is the result ID in the response to the batch result list request.

    Salesforce returns a CSV response with data such as this.

    "Id","Name"
    "001x000xxx4TU4JAAW","name161268--1296595660659"
    "001x000xxx4TU4KAAW","name161269--1296595660659"
    "001x000xxx4TU4LAAW","name161270--1296595660659"
    "001x000xxx4TU4MAAW","name161271--1296595660659"
    "001x000xxx4TU4NAAW","name161272--1296595660659"
    "001x000xxx4TU4OAAW","name161273--1296595660659"
    "001x000xxx4TU4PAAW","name161274--1296595660659"
    "001x000xxx4TU4QAAW","name161275--1296595660659"
    "001x000xxx4TU4RAAW","name161276--1296595660659"
    "001x000xxx4TU4SAAW","name161277--1296595660659"
    ...

Close the Job

  1. Create a file called close-job.xml containing this text.
    <?xml version="1.0" encoding="UTF-8"?>
    <jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
      <state>Closed</state>
    </jobInfo>
  2. Using a command-line window, execute this cURL command to close the job.

    curl -H "X-SFDC-Session: sessionId" -H "Content-Type: text/csv; charset=UTF-8" -d @close-job.xml https://instance.salesforce.com/services/async/64.0/job/jobId

    Salesforce returns an XML response with data such as this.

    <?xml version="1.0" encoding="UTF-8"?>
    <jobInfo
        xmlns="http://www.force.com/2009/06/asyncapi/dataload">
      <id>750x000000009tvAAA</id>
      <operation>query</operation>
      <object>Account</object>
      <createdById>005x0000001WR0lAAG</createdById>
      <createdDate>2016-01-10T00:53:19.000Z</createdDate>
      <systemModstamp>2016-01-10T00:53:19.000Z</systemModstamp>
      <state>Closed</state>
      <concurrencyMode>Parallel</concurrencyMode>
      <contentType>CSV</contentType>
      <numberBatchesQueued>0</numberBatchesQueued>
      <numberBatchesInProgress>0</numberBatchesInProgress>
      <numberBatchesCompleted>4</numberBatchesCompleted>
      <numberBatchesFailed>0</numberBatchesFailed>
      <numberBatchesTotal>4</numberBatchesTotal>
      <numberRecordsProcessed>350000</numberRecordsProcessed>
      <numberRetries>0</numberRetries>
      <apiVersion>36.0</apiVersion>
      <numberRecordsFailed>0</numberRecordsFailed>
      <totalProcessingTime>3500</totalProcessingTime>
      <apiActiveProcessingTime>3500</apiActiveProcessingTime>
      <apexProcessingTime>0</apexProcessingTime>
    </jobInfo>