Do you have a SOQL query or a report that takes a long time to complete because you are querying data from an object that has tens of millions of rows, yet your business requirements won’t allow you to add a selective filter? Read on to learn more about sort optimization, a simple technique that many developers and architects overlook when applying SOQL performance tuning best practices.
What is Sort Optimization?
So what is sort optimization? The concept is very simple.
Let’s say that you contacted salesforce.com Customer Support to place a custom index on a field to optimize the performance of your SOQL query. As the number of records in your object increases, or the data distribution changes over time, your filters might no longer meet the appropriate Force.com query optimizer selectivity threshold. (See the Database Query & Search Optimization Cheatsheet.) When the filters don’t meet that selectivity threshold, the Force.com query optimizer does a full table scan instead of using the index, and the query takes significantly longer to complete or simply times out altogether.
For a variety of business reasons, it might be difficult to add filters to make the query more selective. But if you could add just one sort clause (ORDER BY) on an indexed field, which in many cases makes the data more consumable by end users, you would have a good chance of bringing back the performance benefits offered by using an index.
So how would adding a sort clause make the query more selective? It wouldn’t. Sort clauses have nothing to do with selectivity. You might also ask, “Wouldn’t adding an ORDER BY clause increase the overhead because Salesforce must complete a sorting task in addition to selecting the appropriate rows?” The answer is “no.” When you have an ORDER BY clause bound by a LIMIT clause–and chances are that you already have one in place to avoid hitting governor limits–the Force.com query optimizer considers using the index because the index, by nature, is a presorted list of values. The query starts from either end of the index, depending on your ascending or descending order preference, and then works its way up or down until it reaches the number or records specified by the LIMIT clause. It knows exactly when to stop searching and can return the query results without having to do a full table scan.
Applying Sort Optimization
Let’s dig in a bit deeper. There are a few best practices that you should know and follow when applying the sort optimization technique.
- Sort on a number or date field.
The Force.com query optimizer considers number (e.g., order number, customer ID) or date (e.g., CreatedDate) data type fields as candidates for sorting. If you need to use a field that does not have a number or date type, you can add a “hidden” field, populate it with converted number values, and then place an index on it.
- Sort on the second field for a two-column index.
If your custom index is a two-column index, make sure that you have an equality filter for the first field and are sorting on the second field. The equality filter for the first field doesn’t need to meet the Force.com query optimizer’s selectivity threshold, but it does need to appear in the query’s WHERE clause for the Force.com query optimizer to consider using the two-column index.
- Exclude null records.
If you sort on a field that might contain null records–regardless of whether you actually have any–add a filter in the WHERE clause to exclude those records. Otherwise, the Force.com query optimizer won’t consider using the index. This means you cannot apply sort optimization if you want to include records that have null values in the sorting fields as part of your results.
- Evaluate optimization.
When you have multiple filters, and the matching records don’t appear anywhere close to the beginning or end of the index, Salesforce might end up scanning a large number of records on the index. Similarly, if you target a large number of records with your LIMIT clause, the query might take a long time to complete.
Best Use Cases for Sort Optimization
In summary, sort optimization takes advantage of available indexes by picking up records in a limited, orderly fashion. The best sort optimization use cases involve running your SOQL queries to list most recent purchase orders, or to find record set boundaries such as “largest,” “least,” “top 50,” and so on for your record set. Sort optimization might have your queries less of a performance boost for something like a list of the 5,000 customers who most recently purchased a product that the company stopped selling 10 years ago. This performance difference exists because the query might need to scan a few thousand records on the index before it finds the first matching record, then keep searching until it reaches the five thousandth record. Unless this product sold so well until 10 years ago that filtering on “product” does not meet the Force.com query optimizer’s appropriate selectivity threshold, the Force.com query optimizer is most likely going to use indexes on dates and products instead of using sort optimization.
- Database Query & Search Optimization Cheat Sheet
- Webinar: Inside the Force.com Query Optimizer
- Maximizing the Performance of Force.com SOQL, Reports, and List Views
- Force.com SOQL Best Practices: Nulls and Formula Fields
- Performance Tuning for Visualforce and Apex
- Architect Core Resources
About the Author and CCE Technical Enablement
Daisuke Kawamoto is an Architect Evangelist within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound salesforce.com solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.