Salesforce Bulk Data Loads and Full-Text Search Indexes
Quickly loading large amounts of data into the Salesforce1 Platform is certainly awesome, but can temporarily cause stale full-text search indexes. Do you understand why this can happen? More importantly, do you know why this matters to your application users and strategies for addressing this problem? Read this post for more details.
Sean Regan recently wrote an informative hands-on article that teaches you how to achieve maximum data throughput rates for parallel data loads and integrations involving the Salesforce1 Platform. All good, right? Actually, there’s an important side effect to realize once you start pumping tons of data into your database using the Salesforce Bulk API.
Salesforce1 Platform Full-Text Search Processing
We all expect web-based applications to have an interactive search capability that lets us scan all or a selected scope of an application’s database, return up-to-date ranked results, and do it all with sub-second response times. To automatically provide such robust search functionality for Salesforce1 applications, the platform leverages a search engine that is separate from its transaction engine. The relationship between the two engines is depicted in the following figure.
Notice how the search engine receives data from the transactional engine and creates search indexes. The transactional engine forwards search requests to the search engine, which returns results that the transaction engine uses to locate rows that satisfy the search request.
As applications load and update data in text fields, a pool of background indexing processes are responsible for asynchronously updating corresponding indexes. Full-text search indexes for each organization (tenant) reside outside the core transaction engine.
Beware of Lags in Search Indexing … Especially After Loads
Depending on the current load and utilization of indexing servers, asynchronous text index updates may lag behind actual transactions. This lag means that stale search indexes can lead to search results not entirely representative of the current database records.
In particular, longer lags in search indexing often appear when you run a massive, high-throughput data load. The amount of time necessary to update search indexes is directly related to the amount of text data that such loads modify, and can be quite lengthy in some cases.
Strategies for Addressing Lags in Search Indexing
So how can you architect acceptable solutions that address inevitable lags in search indexing after data loads? Here are a couple of things to consider.
- Disable full-text search indexing for custom objects (especially large ones) that don’t need to be searchable. This best practice helps to avoid unnecessary load on search indexing. Disabling this feature only affects full-text searches, SOSL queries, and enhanced lookups–it does not affect SOQL queries.
- Instead of relying on the full-text search engine and SOSL, implement your application’s search feature using SOQL. Because SOQL queries target the transactional database, they’ll always return results that correspond to the latest set of committed records.
About the Author
Steve Bobrowski is an Architect Evangelist within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound Salesforce solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.