Newer Version Available
High-Level Replication Steps
Types of Events That Change Data Capture Can Generate: Change Events, Gap Events, and Overflow Events
Generally, Salesforce captures record changes by sending change events, which the subscriber receives to synchronize data in an external system. Sometimes, gap events or overflow events are generated.
Gap events are generated when change events can't be generated. They inform subscribers about errors or operations done outside of Salesforce application servers. Gap events don’t contain record data, but they contain the record ID, which enables you to retrieve the record from Salesforce. Ensure that the subscriber expects to receive gap events and handles them properly, as outlined in the next section. The changeType field in the gap event header identifies the gap event and the associated operation, and can take one of these values:
- GAP_CREATE
- GAP_UPDATE
- GAP_DELETE
- GAP_UNDELETE
For more information about gap events, see Gap Events.
Overflow events are generated when a single transaction involves more than 100,000 changes. The first 100,000 changes generate change events. The set of changes beyond that amount generates one overflow event for each entity type included in that set. Overflow events include header fields but no record data and no record ID. Ensure that the subscriber handles overflow events. The changeType field header value is GAP_OVERFLOW instead of the specific type of change.
For more information about overflow events, see Overflow Events.
Transaction-Based Replication Approach
Each change event contains a transaction key in the header that uniquely identifies the transaction that the change is part of. Each change event also contains a sequence number that identifies the sequence of the change within a transaction. The sequence number is useful for operations that include multiple steps, such as lead conversion. If not all objects involved in a transaction are enabled for Change Data Capture, there will be a gap in the sequence numbers. We recommend that you replicate all the changes in one transaction as a single commit in your system. One approach is to buffer all changes related to a transaction and commit them all at once.
If you choose not to use a transaction-based replication process, your replicated data can be incomplete if your subscription stops. For example, if your subscription stops in the middle of an event stream for one transaction, only part of the transaction’s changes are replicated in your system.
A transaction-based replication process involves these high-level steps.
- In your subscribed client, allocate a transaction buffer for each transaction key. For example, create a map (Map<String, List<ChangeEvent>>) where the key is the transactionKey value.
- Open a CometD subscription to the general /data/ChangeEvents channel that captures all enabled events.
- For each change event received over the channel, check the changeType field.
- If the changeType field is GAP_CREATE, GAP_UPDATE, GAP_DELETE, or GAP_UNDELETE, the event is a gap event. Follow the recommended steps in How to Handle a Gap Event.
- If the changeType field is GAP_OVERFLOW, the event is an overflow event.
- Process the change events that you previously stored in the map. Commit the changes, and then purge the corresponding map entry.
- For the overflow event, follow the recommended steps in How to Handle an Overflow Event.
- If the event isn’t a gap or overflow event, it’s a change event. Deserialize the change event, and add it to the appropriate map entry for the transaction key.
- When the transactionKey value changes in the next change event, commit the changes in the map entry for the previous transaction key, and then purge the map entry.
- Repeat steps 3 through 5 for each new event received.
How to Handle a Gap Event
If the event that the subscriber receives is a gap event, get the latest data from Salesforce. The gap event includes the ID of the affected record enabling you to retrieve the record. After receiving the gap event, one approach is to mark the corresponding record as dirty and not process any change events for that record until it has been reconciled.
Let's look at an example to examine the steps a subscriber can take to handle a gap event while change events are also received. Records A and B are modified in a transaction and generate two change events. Then a change for record C generates a gap event. The subscriber receives three events: two change events for record A and B and one gap event for record C. The steps for the subscriber are:
- Handle the change events according to the transaction-based replication process.
- For the gap event, mark the corresponding record as dirty as of the date of the gap event. Use the commitTimestamp header field value of this gap event as the date to compare with the commitTimestamp values of other events received.
- If you receive newer change events for the same record, don't process them. For example, if record C is modified again and a change event is received, ignore it because the corresponding record is marked as dirty.
- Reconcile the data for record C. Make a Salesforce API call, such as a REST API call, to retrieve the full data for record C, and save it in your system. Then clear the dirty flag on that record.
- Record C is modified again and a new change event is received. Process this change event according to the replication process because the record is no longer dirty.
How to Handle an Overflow Event
If a change results in more than 100,000 events in a single transaction, you receive overflow events for the events sent after the first 100,000. One overflow event is generated for each entity type. Mass changes aren't frequent. They can result from creating or modifying many records, such as changing a recurring calendar event series with many occurrences and invitees. A large change can also result from a cascade delete when deleting records with many related records.
An overflow event doesn't contain the record ID and only a dummy record ID, so one approach for data replication is to retrieve all records of the corresponding entity after an overflow event is received. Then you can update or delete those records in the external system. This approach can be the most process-intensive because it resyncs all the records for an entity. However, it’s the simplest approach because it doesn't require figuring out which records changed in a particular timeframe and filtering out the records that resulted in change events. These steps outline the process of reconciling data when the overflow event is received.
- After you receive an overflow event in your subscriber, unsubscribe from the channel, and stop processing further events. This step is in preparation of a full data synchronization for the entity.
- Store the Replay ID of the overflow event. This ID is the starting point for the data reconciliation.
- Reconcile the data for new, updated, and undeleted records.
- Retrieve all records for the entity. Depending on the volume of records stored, this process can take some time.
- Synchronize the data in your system by overwriting it with the retrieved data from Salesforce.
- Reconcile the data for deleted records by performing one of the following steps.
- Get the non-deleted records from Salesforce, and synchronize.
- Identify all records for that entity in your system that weren’t updated through the synchronization that you performed in step 3. These records are the deleted ones.
- Delete the identified records from your system.
- Or get the deleted records from Salesforce, and synchronize.
- Query all records for the entity with isDeleted=true. You get all the soft-deleted records for that entity that are in the Recycle Bin.
- Identify the records in your system that match the records returned in the previous step.
- Delete the identified records from your system.
- Get the non-deleted records from Salesforce, and synchronize.
- Resubscribe to the stored event bus stream starting from the Replay ID you saved earlier.
- We recommend that you process all change events after that Replay ID. This way, you catch up on any data changes that happened during the synchronization and weren’t saved in your system.
- If you encounter an overflow event for another entity (entityName field value), repeat this process for that entity.