PK Chunking Header

Use the PK Chunking request header to enable automatic primary key (PK) chunking for a bulk query job. PK chunking splits bulk queries on very large tables into chunks based on the record IDs, or primary keys, of the queried records.

Each chunk is processed as a separate batch that counts toward your daily batch limit, and you must download each batch’s results separately. PK chunking works only with queries that don’t include SELECT clauses or conditions other than WHERE.

PK chunking is supported for the following objects: Account, Asset, Campaign, CampaignMember, Case, CaseArticle, CaseHistory, Contact, Event, EventRelation, Lead, LoginHistory, Opportunity, Task, User, WorkOrder, WorkOrderLineItem, and custom objects.

PK chunking works by adding record ID boundaries to the query with a WHERE clause, limiting the query results to a smaller chunk of the total results. The remaining results are fetched with extra queries that contain successive boundaries. The number of records within the ID boundaries of each chunk is referred to as the chunk size. The first query retrieves records between a specified starting ID and the starting ID plus the chunk size. The next query retrieves the next chunk of records, and so on.

For example, let’s say you enable PK chunking for the following query on an Account table with 10,000,000 records.

1SELECT Name FROM Account

Assuming a chunk size of 250,000 and a starting record ID of 001300000000000, the query is split into the following 40 queries. Each query is submitted as a separate batch.

1SELECT Name FROM Account WHERE Id >= 001300000000000 AND Id < 00130000000132G
2SELECT Name FROM Account WHERE Id >= 00130000000132G AND Id < 00130000000264W
3SELECT Name FROM Account WHERE Id >= 00130000000264W AND Id < 00130000000396m
4...
5SELECT Name FROM Account WHERE Id >= 00130000000euQ4 AND Id < 00130000000fxSK

Each query executes on a chunk of 250,000 records specified by the base-62 ID boundaries.

PK chunking is designed for extracting data from entire tables, but you can also use it for filtered queries. Because records could be filtered from each query’s results, the number of returned results for each chunk can be less than the chunk size. Also, the IDs of soft-deleted records are counted when the query is split into chunks, but the records are omitted from the results. Therefore, if soft-deleted records fall within a given chunk’s ID boundaries, the number of returned results is less than the chunk size.

The default chunk size is 100,000, and the maximum size is 250,000. The default starting ID is the first record in the table. However, you can specify a different starting ID to restart a job that failed between chunked batches.

When a query is successfully chunked, the original batch’s status shows as NOT_PROCESSED. If the chunking fails, the original batch’s status shows as FAILED, but any chunked batches that were successfully queued during the chunking attempt are processed as normal. When the original batch’s status is changed to NOT_PROCESSED, monitor the subsequent batches. You can retrieve the results from each subsequent batch after it’s completed. Then you can safely close the job.

Salesforce recommends that you enable PK chunking when querying tables with more than 10 million records or when a bulk query consistently times out. However, the effectiveness of PK chunking depends on the specifics of the query and the queried data.

Header Field Name and Values

Field name

Sforce-Enable-PKChunking

Field values

TRUE—Enables PK chunking with the default chunk size, starting from the first record ID in the queried table.
FALSE—Disables PK chunking. If the header isn’t provided in the request, the default is FALSE.
chunkSize—Specifies the number of records within the ID boundaries for each chunk. The default is 100,000, and the maximum size is 250,000. If the query contains filters or soft-deleted records, the number of returned results for each chunk could be less than the chunk size.
parent—Specifies the parent object when you’re enabling PK chunking for queries on sharing objects. The chunks are based on the parent object’s records rather than the sharing object’s records. For example, when querying on AccountShare, specify Account as the parent object. PK chunking is supported for sharing objects as long as the parent object is supported.
Similarly, for CaseHistory, specify Case as the parent object.
startRow—Specifies the 15-character or 18-character record ID to be used as the lower boundary for the first chunk. Use this parameter to specify a starting ID when restarting a job that failed between batches.

Example

Sforce-Enable-PKChunking: chunkSize=50000; startRow=00130000000xEftMGH

Bulk API Developer Guide

PK Chunking Header

Header Field Name and Values