You are planning a Force.com implementation with large volumes of data. Your data model is in place, all your code is written and has been tested, and now it’s time to load the objects, some of which have tens of millions of records.
What is the most efficient way to get all those records into the system?
The Force.com Extreme Data Loading Series
This is the third entry in a six-part series of blog posts covering many aspects of data loading for very large enterprise deployments.
Here are the topics planned for this series.
- Designing the data model for performance
- Loading data into a lean configuration
- Suspending events that fire on insert
- Sequencing load operations
- Loading and extracting data
- Taking advantage of deferred sharing calculations
This post explains how to speed up large data loads by temporarily disabling data operations that customers frequently perform during record inserts.
Why Suspend Data Validation and Enrichment Events on Record Insert?
The Force.com platform includes powerful tools for making sure data entered by the users of your applications is clean and includes appropriate relationships between records.
- Validation rules ensure that the data users enter for new and existing records meets the standards specified by your business.
- Workflow rules allow you to automate field updates, email alerts, outbound messages, and tasks associated with workflow, approvals, and milestones.
- Assignment rules distribute leads and cases to appropriate teams, and accounts to appropriate territories.
- Triggers are Apex code that allow you to manipulate data and perform other actions on record insert.
While these tools can allow you to preserve data integrity during normal operations, they can also slow inserts to a crawl if you enable them during massive data loads. As we mentioned in our last post on loading lean, when you reach the cutover point while replacing a legacy system, you might want to load your data as quickly as possible to minimize the effort required to synchronize it.
But if you turn off validation, workflow, and triggers, how can you ensure that, once you’ve finished loading, you have accurate data and the right relationships established between objects? There are three key phases to this effort.
- Analyzing and preparing data
- Disabling events for loading
Let’s look at each of these phases in more detail.
Analyzing and Preparing Data
To load safely without triggers, validation rules, and workflow rules running, you can examine the business requirements that you could ordinarily meet with these operations, then ask the following questions to find suitable alternatives.
Which of your requirements can you meet by data cleansing before data loading, or by sequencing load operations where there are critical dependencies between objects?
For example, if you normally use a validation rule to ensure that user entries are within valid ranges, you can query the data set before loading to find and fix records that don’t conform to the rules.
Where you have parent-child relationships in your data model, you need to have the IDs of the parent records before you can insert their children. In these cases, you can sequence operations to load the parents, extract the IDs, and update the source data to include the parent IDs before loading the child records.
With multiple dependencies, this sequencing can be tricky, so make sure you check Architect Core Resources for “Sequencing Load Operations,” the next installment in this series.
Which of your requirements can you meet by post-processing records after data loading?
One typical set of use cases in this category relates to data enrichment – which could involve adding lookup relationships between objects, roll-up summary fields to parent records, and other data relationships between records. Another set relates to the triggering of workflows and other business actions that allow you to process and take advantage of the new data. We’ll come back to these in the Post Processing section below.
Disabling Events for Loading
Once you have analyzed all your data validation and enrichment requirements, and planned actions to manage them either before or after data loading, you can temporarily disable your rules and triggers to speed up loading. Simply edit each rule and set it to “inactive” status.
You can disable Validation, Lead and Case assignment rules, and Territory assignment rules in the same way.
Temporarily disabling triggers is a bit more complex and requires some preparation. First, create a Custom Setting and a corresponding checkbox field to control when a trigger should fire. Then include a statement like the one highlighted below in your trigger code. Once this is done, disabling or enabling your trigger is as simple as editing the checkbox field.
Note: In practice, a more robust architecture would include a single, top-level trigger class, with all the work done in helper classes off the main trigger class. Each helper class would have its own custom setting and checkbox field to turn the setting on or off. You can configure exactly what you want to run and what you don’t simply by selecting or deselecting the appropriate fields.
Once you have finished loading your data, it is time to complete the data enrichment and configuration tasks you have deferred until this point:
- Add lookup relationships between objects, roll-up summary fields to parent records, and other data relationships between records.
- Enhance records in Salesforce with foreign keys or other data to facilitate integration with your other systems.
- Batch Apex and the Force.com Bulk API are both efficient methods for performing these updates to a very large number of records.
- Reset the fields on the custom settings you created for triggers, so that they will fire appropriately on record creation and updates.
- Turn validation, workflow, and assignment rules back on so they will trigger the appropriate actions as users enter and edit records.
Note: When you enable these rules again, they will not automatically process the data you have just loaded. You might have to use the Data Loader or additional Apex coding to trigger an update on these records.
When you need to load a very large amount of data quickly, you want to ensure that each insert is as efficient as possible. With appropriate preparation and post-processing, you can disable data validation and enrichment operations while loading–without compromising your data integrity or business rules.
- Extreme Force.com Data Loading, Part 1: Tune Your Data Model
- Extreme Force.com Data Loading, Part 2: Loading into a Lean Salesforce Configuration
- Best Practices for Deployments with Large Data Volumes
- Designing Record Access for Enterprise Scale
- About Validation Rules
- Creating Workflow Rules
- Managing Apex Triggers
- Architect Core Resources
About the Author
Bud Vieira is an Architect Evangelist within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound salesforce.com solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.