If data is king, timely access is queen. If you have sales representatives closing opportunities, support representatives working through a list of cases, or even managers running reports, you’ll want to optimize query performance in your Force.com applications. In saleforce.com’s multitenant environment, the Force.com query optimizer does its own kind of optimization, generating the most efficient SQL from your SOQL, reports, and list views. This blog post explains the filter conditions and the Force.com query optimizer thresholds that determine the selectivity of your queries and affect your overall query performance.
The Force.com Query Optimizer
The Force.com query optimizer is an engine that sits between your SOQL, reports, and list views and the database itself. Because of salesforce.com’s multitenancy, the optimizer gathers its own statistics instead of relying on the underlying database statistics. Using both these statistics and pre-queries, the optimizer generates the most optimized SQL to fetch your data. It looks at each filter in your WHERE clause to determine which index, if any, should drive your query.
To determine if an index should be used to drive a query, the Force.com query optimizer checks the number of records targeted by the filter against selectivity thresholds. For a standard index, the threshold is 30 percent of the first million targeted records and 15 percent of all records after that first million. In addition, the selectivity threshold for a standard index maxes out at 1 million total targeted records, which you could reach only if you had more than 5.6 million total records.
So if you had 2.5 million accounts, and your SOQL contained a filter on a standard index, that index would drive your query if the filter targeted fewer than 525,000 accounts.
(30% of 1 to 1 million targeted records) + (15% of 1 million to 2.5 million targeted records) = 300,000 + 225,000 = 525,000
For a custom index, the selectivity threshold is 10 percent of the first million targeted records and 5 percent all records after that first million. In addition, the selectivity threshold for a custom index maxes out at 333,333 targeted records, which you could reach only if you had more than 5.6 million records.
If the previous query were changed so that it used a filter on a field with a custom index, the threshold for 2.5 million accounts would be 175,000.
(10% of 1 to 1 million targeted records) + (5% of 1 million to 2.5 million targeted records) = 100,000 + 75,000 = 175,000
In these standard index and custom index examples, the Force.com query optimizer does use the standard and custom indexes, as each number of targeted records falls below the appropriate selectivity threshold. If, on the other hand, the number of targeted records exceeds an index’s selectivity threshold, the Force.com query optimizer does not use that index to drive the query.
The Inside the Force.com Query Optimizer webinar explains in more detail how you can create selective queries for the Force.com query optimizer.
Common Causes of Non-Selective SOQL Queries
There are several factors that can prevent your SOQL queries from being selective.
Having Too Much Data
Whether you’re displaying a list of records through a Visualforce page or through a list view, it’s important to consider the user experience. Pagination can help, but will your users really go through a list with thousands of records? You might not have this much data in your current implementation, but if you don’t have enough selective filters, these long lists can easily become an issue as your data grows. Design your SOQL, reports, and list views with large data volumes in mind.
Performing Large Data Loads
Large data loads and deletions can affect query performance. The Force.com query optimizer uses the total number of records as part of the calculation for its selectivity threshold.
This number takes into account your recently deleted records. A deleted record remains in the Recycle Bin for 15 days—or even less time if you exceed your storage limit, and the record has been in the Recycle Bin for at least two hours—and then that record is actually removed from the Recycle Bin or flagged for a physical delete. When the Force.com query optimizer judges returned records against its thresholds, all of the records that appear in the Recycle Bin or are marked for physical delete do still count against your total number of records.
From our earlier example of accounts and a custom indexed field, the selectivity threshold was 175,000, and the total number of records was 2.5 million.
Let’s say that a Bulk API job runs and deletes all records before January 1, 2013, and those records total 2.4 million. That leaves us with 100,000 non-deleted account records. If the deleted records are still in the Recycle Bin, the Force.com optimizer mistakenly thinks that the 100,000 non-deleted records fall under and meet a 2.5 million-record selectivity threshold, and it generates a query that isn’t optimized. In reality, the threshold is 10,000 targeted records (10 percent of 100,000 targeted records).
If the deleted records do not need to go to the Recycle Bin, use the hard delete option in the Bulk API or contact salesforce.com Customer Support to physically delete the records.
If your data loads cause the records targeted by your filter to exceed the selectivity threshold, you might need to include additional filters to make your queries selective again.
Using Leading % Wildcards
A LIKE condition with a leading % wildcard does not use an index.
This is the type of query that would normally work better with SOSL. However, if you need real-time results, an alternative is to create a custom search page, which restricts leading % wildcards and adds governance on the search string(s).
Note: Within a report/list view, the CONTAINS clause translates into ‘%string%’.
Using NOT and !=
When your filter uses != or NOT—which includes using NOT EQUALS/CONTAINS for reports, even if the field is indexed—the Force.com query optimizer can’t use the index to drive the query. For better performance, filter using = or IN, and the reciprocal values.
Consider this example. You have 1 million cases and a custom index on the Status field, which has the following values and distribution of returned records.
This query won’t use the index on the Status field because of the !=. Use the reciprocal values instead. Also note that if you take the four Status values from the updated query, the record count (50,000 + 20,000 + 30,000 + 20,000) meets the selectivity threshold.
The following query won’t use the index on the Status field because of the NOT. The problem with this query is that even if you change it to use the reciprocal values, the index on the Status field won’t meet the selectivity threshold because of the Closed value. In this case, you do want to add additional filters to reduce the number of records retrieved.
Note: Using a filter on an indexed field such as CreatedDate is always recommended, but this field was not included in the original query so that we could make a point about the selectivity threshold.
Using Complex Joins
Complex AND/OR conditions and sub-queries require the Force.com query optimizer to produce a query that is optimized for the join, but might not perform as well as multiple issued queries would. This is especially true with the OR condition. For Force.com to use an index for an OR condition, all of the fields in the condition must be indexed and meet the selectivity threshold. If the fields in the OR condition are in multiple objects, and one or more of those fields does not meet a selectivity threshold, the query can be expensive.
For more information on AND/OR conditions, refer to the Inside the Force.com Query Optimizer webinar.
Filters on formula fields that are non-deterministic can’t be indexed and result in additional joins. Common formula field practices include transforming a numeric value in a related object into a text string or using a complex transformation involving multiple related objects. In both cases, if you filter on this formula field, the Force.com query optimizer must join the related objects.
If you have large data volumes and are planning to use this formula field in several queries, creating a separate field to hold the value will perform better than following either of the previous common practices. You’ll need to create a workflow rule or trigger to update this second field, have this new field indexed, and use it in your queries.
The Force.com query optimizer takes your SQL, reports, and list views and generates optimized queries. It chooses which index, if any, should drive your query, and it uses its selectivity thresholds to make that decision. If the number of records returned for an index is larger than a threshold, query performance wouldn’t improved by the use of that index.
To ensure that your queries are selective, avoid some common pitfalls.
- Understand your schema and have proper indexes created.
- Apply as many filters as possible to reduce the result set.
- Minimize the amount of records in the Recycle Bin.
- Remember that NOT operations and LIKE conditions with a leading % wildcard do not use indexes, and complex joins might perform better as separate queries.
- Inside the Force.com Query Optimizer
- Query & Search Optimization Cheat Sheet
- Working with Very Large SOQL Queries
- Best Practices for Deployments with Large Data Volumes
- Architect Core Resources
About the Author and CCE Technical Enablement
John Tan is an Architect Evangelist within the Technical Enablement of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound Salesforce solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.