Managing Task Locks for Data Loads
The Salesforce platform uses locks to ensure referential data integrity. When dealing with large data volumes, record locks and contention can impact performance. This post talks about one of the more common scenarios dealing with Task record locks.
Managing Task Locks
Recently we wrote an article and provided a webinar on maximizing parallelism when loading large data volumes. We demonstrated that by optimizing your data loads to take advantage of parallelism, you can gain several orders of magnitude in performance improvements. Understanding when to take advantage of parallel batches in the Bulk API, how to detect and remove data lock contention, and ensuring Salesforce has sufficient batches to keep available processes busy will save you time, and let you scale your Salesforce organization for maximum growth.
We got a lot of great feedback from customers that used the information in the article and webinar to improve their data loads. So many customers decided to take advantage of the benefits of optimizing for parallelism that we were able to go one step further, and do deep analysis on large data loads to find the most common difficulties encountered when optimizing for parallelism.
What we noticed is that the number one object that is the trickiest to deal with when optimizing loads is the task object. The task object has particularly complex rules around record locking that can potentially cause lock contention, resulting in jobs that can’t run concurrently and end up slowing the entire load down, sometimes resulting in failed DML operations.
First, let’s get a quick refresher on locks and lock contention. Like any other system built on a relational database, Salesforce locks records to ensure data integrity. One example where Salesforce needs to lock records is during bulk data loads that include records with master-detail relationships. When an object is in a master-detail relationship with another object, and you insert any detail records, Salesforce locks the related master records. If detail records that reference the same master record are inserted simultaneously in separate batches, there’s a high risk that those inserts will cause lock exceptions.
There are many potential sources of lock contention — to help, we created a record locking cheatsheet to list some of the most common sources.
So what are the lock contention risks with tasks? As the record locking cheat sheet shows, when a task is inserted or updated, the associated Who, What, and Account records will get locked. However, on insert, the locks only occur if the task status is Completed and the task activity date is not null. On update and delete, the locks occur regardless of the task status or activity date values. Keep these conditions in mind as we talk about common strategies when loading tasks.
Ordering Task Loads by Account
In many scenarios, you can reduce lock contention from tasks by organizing your batches by the account a task is associated with. A task record can reference an account in several different ways via the What or Who fields. The What field can reference an account, or reference other objects that in turn reference an account, such as an opportunity, or a custom object that’s a child of account. The Who field can also reference an account by referencing a related contact.
A fairly common scenario is when multiple tasks are related to the same account (and nothing else). In these situations, you should be able to order your batches by the associated account (or the related Who/What object that’s referencing an account), load these batches in a job in parallel mode, and avoid lock contention.
For example, TaskA.Who might be related to a contact that’s related to the Acme account, and TaskB.What might be related to Acme as well. If you create a batch with all tasks associated only with Acme, this batch can be loaded in a job in parallel with other batches that aren’t associated with Acme, and you can be certain that you won’t incur any account lock contention between batches.
In some cases you might end up with more than 10,000 tasks, all related to the same account. Since the Bulk API batch size limit is 10,000 records, you might not be able to isolate all tasks related to a specific account to a single batch. If the number of tasks related to a single account is not significantly more than 10,000 records, such that you can still create a set of batches with very minimal overlap in related accounts, you should be able to still do a parallel job with these batches. The Bulk API will handle jobs with very minimal overlap correctly.
However, if the number of tasks related to a single account is significantly more than 10,000 records, you’ll end up with too much overlap across multiple batches, and lock contention will cause load problems. In this case, you’ll want to take that set of tasks and load them in a separate serial job using a controlled feed load.
Using Controlled Feed Loads
If you have large sets of tasks associated with accounts, or tasks with What and Who referencing different accounts, or even tasks with Who or What referencing objects other than Account, the risk of lock contention goes up. For these scenarios, we recommend setting up a controlled feed load for those tasks, where you directly control running controlled serial jobs.
When dealing with tasks that are related to multiple objects, within each controlled job you might still need to segregate batches to minimize lock contention. This will require careful understanding of the various relationships each of your tasks have. For example, if TaskA.Who references a contact (that is related to an account), and TaskA.What references a custom object (that is not related to an account), when TaskA is loaded, it will lock the contact (which will lock the related account) and the custom object. The job that loads TaskA will need to segregate batches such that any task that is related to the contact or the account be batched together.
Additionally, remember that even if you are manually handling controlled jobs, if you plan to use the jobs as part of a regularly scheduled load process, make sure to allow enough time between scheduled jobs so the jobs don’t overlap. A job overlap might result in another scenario where tasks associated with the same account result in lock contention.
The parallelism article goes into several examples of working with controlled feed loads.
Determining why data loads are not performing as well as they should requires a clear understanding of how parallelism and locking work in Salesforce. Tasks are just one example of a Salesforce object that has unique locking behavior that needs special care when planning data loading jobs. By using strategies detailed in the parallelism article and understanding Salesforce record locking as described in the record locking cheat sheet, you can make significant gains in your data load performance. In no time, your organization will be able to handle any data load undertaking, no matter how large or complex.
About the Author
Dan Yu is a Technical Writer within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound Salesforce solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.