You are planning a Force.com implementation with large volumes of data. Your data model is in place, all your code is written and has been tested, and now it’s time to load the objects, some of which have tens of millions of records.
What is the most efficient way to get all those records into the system?
The Force.com Extreme Data Loading Series
This is the second entry in a six-part series of blog posts covering many aspects of data loading for very large enterprise deployments.
Here are the topics planned for this series.
- Designing the data model for performance
- Loading data into a lean configuration
- Suspending events that fire on insert
- Sequencing load operations
- Loading and extracting data
- Taking advantage of deferred sharing calculations
This post outlines the steps that you can take to maximize loading efficiency when you need to get your application data into Force.com as quickly as possible.
What Do We Mean by “Loading Lean”?
Whenever you are replacing a legacy system with an application you are building on Force.com, you want to minimize the impact on business-critical operations. A typical strategy for accomplishing this goal is loading lean, including only the data and configuration you need to meet your business-critical operations.
To load lean:
- Identify the business-critical operations before using moving users to Salesforce.
- Identify the minimal data set and configuration required to implement those operations.
- Define a data and configuration strategy based on the requirements you’ve identified.
- Load the data as quickly as possible to reduce the scope of synchronization.
When defining your data loading and configuration strategy, consider using the following setup options to defer non-critical processes and speed up loading.
- Organization-wide sharing defaults – When you load data with a Private sharing model, the system calculates sharing as the records are being added. If you load with a Public Read/Write sharing model, you can defer this processing until after cutover.
- Complex object relationships – The more lookups you have defined on an object, the more checks the system has to perform during data loading. If you can establish some of these relationships in a later phase, loading will be quicker.
- Sharing rules – If you have ownership-based sharing rules configured before loading data, each record you insert requires sharing calculations if the owner of the record belongs to a role or group that defines the data to be shared. If you have criteria-based sharing rules configured before loading data, each record with fields that match the rule selection criteria also requires sharing calculations.
- Workflow rules, validation rules, and triggers – These are powerful tools for making sure data entered during daily operations is clean and includes appropriate relationships between records. Unfortunately, they can also slow down processing if they are enabled during massive data loads. We will cover these rules and triggers in greater detail in the upcoming post about suspending events that fire on insert.
As Lean as Possible, But No Leaner
While you want to remove barriers to faster data loading, it’s also important to remember that a few pieces of your configuration are essential or highly desired during any data load.
- Parent records with master-detail children – You won’t be able to load child records if the parents don’t already exist. We will cover this topic in detail in the upcoming post about sequencing load operations.
- Record owners (users) – In most cases, your records will be owned by individual users, and the owners need to exist in the system before you can load the data.
- Role hierarchy – You might think that loading would be faster if the owners of your records were not members of the role hierarchy. But in almost all cases, the performance would be the same, and it would be considerably faster if you were loading portal accounts. So there would be no benefit to deferring this aspect of configuration.
When preparing for a very large Force.com implementation, you want to transfer legacy data onto the platform efficiently. By stripping down your initial configuration to only those items required for data integrity between objects, you can greatly increase the speed of a massive initial data load. As always, you should test these recommendations in a sandbox organization to identify how these lean methods can best benefit your business needs.
- Extreme Force.com Data Loading, Part 1: Tune Your Data Model
- Best Practices for Deployments with Large Data Volumes
- Designing Record Access for Enterprise Scale
- Architect Core Resources
About the Author
Bud Vieira is an Architect Evangelist within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound salesforce.com solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.