If data is king, timely access is queen. If you have sales representatives closing opportunities, support representatives working through a list of cases, or even managers running reports, you’ll want to optimize query performance in your Force.com applications. In saleforce.com’s multitenant environment, the Force.com query optimizer does its own kind of optimization, generating the most efficient SQL from your SOQL, reports, and list views. This blog post explains the filter conditions and the Force.com query optimizer thresholds that determine the selectivity of your queries and affect your overall query performance.

The Force.com Query Optimizer

The Force.com query optimizer is an engine that sits between your SOQL, reports, and list views and the database itself. Because of salesforce.com’s multitenancy, the optimizer gathers its own statistics instead of relying on the underlying database statistics. Using both these statistics and pre-queries, the optimizer generates the most optimized SQL to fetch your data. It looks at each filter in your WHERE clause to determine which index, if any, should drive your query.

To determine if an index should be used to drive a query, the Force.com query optimizer checks the number of records targeted by the filter against selectivity thresholds. For a standard index, the threshold is 30 percent of the first million targeted records and 15 percent of all records after that first million. In addition, the selectivity threshold for a standard index maxes out at 1 million total targeted records, which you could reach only if you had more than 5.6 million total records.

So if you had 2.5 million accounts, and your SOQL contained a filter on a standard index, that index would drive your query if the filter targeted fewer than 525,000 accounts.

SELECT id FROM Account WHERE CreatedDate  > 2013-01-01T00:00:00Z

(30% of 1 to 1 million targeted records) + (15% of 1 million to 2.5 million targeted records) = 300,000 + 225,000  =  525,000

For a custom index, the selectivity threshold is 10 percent of the first million targeted records and 5 percent all records after that first million. In addition, the selectivity threshold for a custom index maxes out at 333,333 targeted records, which you could reach only if you had more than 5.6 million records.

If the previous query were changed so that it used a filter on a field with a custom index, the threshold for 2.5 million accounts would be 175,000.

SELECT id FROM Account WHERE CustomIndexedDate__c  > 2013-01-01T00:00:00Z

(10% of 1 to 1 million targeted records) + (5% of 1 million to 2.5 million targeted records) = 100,000 + 75,000 = 175,000

In these standard index and custom index examples, the Force.com query optimizer does use the standard and custom indexes, as each number of targeted records falls below the appropriate selectivity threshold. If, on the other hand, the number of targeted records exceeds an index’s selectivity threshold, the Force.com query optimizer does not use that index to drive the query.

The Inside the Force.com Query Optimizer webinar explains in more detail how you can create selective queries for the Force.com query optimizer.

Common Causes of Non-Selective SOQL Queries

There are several factors that can prevent your SOQL queries from being selective.

Having Too Much Data

Whether you’re displaying a list of records through a Visualforce page or through a list view, it’s important to consider the user experience. Pagination can help, but will your users really go through a list with thousands of records? You might not have this much data in your current implementation, but if you don’t have enough selective filters, these long lists can easily become an issue as your data grows. Design your SOQL, reports, and list views with large data volumes in mind.

Performing Large Data Loads

Large data loads and deletions can affect query performance. The Force.com query optimizer uses the total number of records as part of the calculation for its selectivity threshold.

This number takes into account your recently deleted records. A deleted record remains in the Recycle Bin for 15 days—or even less time if you exceed your storage limit, and the record has been in the Recycle Bin for at least two hours—and then that record is actually removed from the Recycle Bin or flagged for a physical delete. When the Force.com query optimizer judges returned records against its thresholds, all of the records that appear in the Recycle Bin or are marked for physical delete do still count against your total number of records.

From our earlier example of accounts and a custom indexed field, the selectivity threshold was 175,000, and the total number of records was 2.5 million.

Let’s say that a Bulk API job runs and deletes all records before January 1, 2013, and those records total 2.4 million. That leaves us with 100,000 non-deleted account records. If the deleted records are still in the Recycle Bin, the Force.com optimizer mistakenly thinks that the 100,000 non-deleted records fall under and meet a 2.5 million-record selectivity threshold, and it generates a query that isn’t optimized. In reality, the threshold is 10,000 targeted records (10 percent of 100,000 targeted records).

If the deleted records do not need to go to the Recycle Bin, use the hard delete option in the Bulk API or contact salesforce.com Customer Support to physically delete the records.

If your data loads cause the records targeted by your filter to exceed the selectivity threshold, you might need to include additional filters to make your queries selective again.

Using Leading % Wildcards

A LIKE condition with a leading % wildcard does not use an index.

SELECT id FROM Account WHERE Name LIKE ‘%Acme%’

This is the type of query that would normally work better with SOSL. However, if you need real-time results, an alternative is to create a custom search page, which restricts leading % wildcards and adds governance on the search string(s).

Note: Within a report/list view, the CONTAINS clause translates into ‘%string%’.

Using NOT and !=

When your filter uses != or NOT—which includes using NOT EQUALS/CONTAINS for reports, even if the field is indexed—the Force.com query optimizer can’t use the index to drive the query. For better performance, filter using = or IN, and the reciprocal values.

Consider this example. You have 1 million cases and a custom index on the Status field, which has the following values and distribution of returned records.

New

50,000

Closed

880,000

On Hold

20,000

Pending

30,000

ReOpened

20,000

This query won’t use the index on the Status field because of the !=. Use the reciprocal values instead. Also note that if you take the four Status values from the updated query, the record count (50,000 + 20,000 + 30,000 + 20,000) meets the selectivity threshold.

SELECT id FROM Case WHERE Status != ‘Closed’
SELECT id FROM Case WHERE Status IN (‘New’, ‘On Hold’, ‘Pending’, ‘ReOpened’)

The following query won’t use the index on the Status field because of the NOT. The problem with this query is that even if you change it to use the reciprocal values, the index on the Status field won’t meet the selectivity threshold because of the Closed value. In this case, you do want to add additional filters to reduce the number of records retrieved.

SELECT id FROM Case WHERE NOT (Status IN (‘On Hold’, ‘Pending’, ‘ReOpened’))
SELECT id FROM Case WHERE Status IN ('New', 'Closed') AND Priority = ‘High’

Note:  Using a filter on an indexed field such as CreatedDate is always recommended, but this field was not included in the original query so that we could make a point about the selectivity threshold.

Using Complex Joins

Complex AND/OR conditions and sub-queries require the Force.com query optimizer to produce a query that is optimized for the join, but might not perform as well as multiple issued queries would. This is especially true with the OR condition. For Force.com to use an index for an OR condition, all of the fields in the condition must be indexed and meet the selectivity threshold. If the fields in the OR condition are in multiple objects, and one or more of those fields does not meet a selectivity threshold, the query can be expensive.

For more information on AND/OR conditions, refer to the Inside the Force.com Query Optimizer webinar.

Filters on formula fields that are non-deterministic can’t be indexed and result in additional joins. Common formula field practices include transforming a numeric value in a related object into a text string or using a complex transformation involving multiple related objects. In both cases, if you filter on this formula field, the Force.com query optimizer must join the related objects.

If you have large data volumes and are planning to use this formula field in several queries, creating a separate field to hold the value will perform better than following either of the previous common practices. You’ll need to create a workflow rule or trigger to update this second field, have this new field indexed, and use it in your queries.

Summary

The Force.com query optimizer takes your SQL, reports, and list views and generates optimized queries. It chooses which index, if any, should drive your query, and it uses its selectivity thresholds to make that decision. If the number of records returned for an index is larger than a threshold, query performance wouldn’t improved by the use of that index.

To ensure that your queries are selective, avoid some common pitfalls.

  1. Understand your schema and have proper indexes created.
  2. Apply as many filters as possible to reduce the result set.
  3. Minimize the amount of records in the Recycle Bin.
  4. Remember that NOT operations and LIKE conditions with a leading % wildcard do not use indexes, and complex joins might perform better as separate queries.

 

Related Resources

About the Author and CCE Technical Enablement

John Tan is an Architect Evangelist within the Technical Enablement of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound Salesforce solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.

tagged , , , , Bookmark the permalink. Both comments and trackbacks are currently closed.
  • Jason Venable

    How much of an impact do complex role hierarchies and sharing rules have on query performance?
    Does the query return all rows that match the filters and these rows are then checked one at a time to see if user has access? Is this fast or slow?

    We have had problems with queries timing out (2 min) but it is very hard for us to isolate the root cause. Is 30 seconds of this the query and the remaining 90 seconds checking record level access?

    It would be great if we had better debugging for large queries. How much time is spent doing the query vs calculating record access. Why or why not indexes where used, etc.

    • John Tan

      Role hierarchies and sharing can impact query performance. When the optimizer generates the optimized query, it will include joins to get the relevant hierarchy/sharing information. In terms of speed, hard to quantify since it’s dependent on the query itself and then the hierarchy/sharing.

      One way you can see the impact is to compare a sys admin user versus someone who is in the hierarchy. You’ll need to run the queries a few times to rule out caching. But the sys admin user will not need to have the hierarchy/sharing joined to the query.

      I agree that we need to provide more feedback on the optimizer. Safe Harbor, as mentioned in the Query Optimizer webinar, we’re looking into it.

      • Jason Venable

        Yup, we have done the admin vs user query before and this is something that often is a source of confusion for customers. “Why is this query fast for some users and times out for others?” We think it is because of complex rules but we never really know and it can be difficult to know if it is something they should do (sharing settings) or something we should do (query optimizations).

        Regardless, totally appreciated these posts on improving query performance as a whole.

  • ram_sj

    Thanks for nice write up, its should be nice to have same type of
    polices for both standard and custom indexes to enforce index usage at the same record level.
    Currently the deletes are slow, sometime it takes more than 30 mins
    for our 150k snapshot records cleanup. It would be nice to know some
    statistics of how our query is executed and provided visibility to
    developers.

    • John Tan

      Standard object/Standard field is stored in a different table than Standard object/Custom field as well as Custom objects. Hence the need for different selectivity thresholds.

      For custom objects, we do have a feature that can truncate the whole object. If you need to delete a large portion of your records, it may be easier to truncate and load only what’s needed.

      Safe Harbor, we’re looking into providing info on the query optimizer.

  • jonathanrico.

    Thanks for posting this John!

    Just one question on the “Status” field example. According to this example, the status field is a custom index so the selectivity threshold will be 100,000 records. I don’t understand why the following sum will meet the threshold for a custom index : “Also note that if you take the four Status values from the updated query, the record count (150,000 + 20,000 + 30,000 + 20,000) meets the selectivity threshold.”

    Am I missing something here?

    • John Tan

      You’re right. The example with Status should have a threshold of 100K
      since we have 1 million Case records. The row count for ‘New’ should be
      50K instead of 150K and ‘Closed’ should be 880K instead of 780K. Post
      has been updated. Thanks!