Developing Selective Force.com Queries through the Query Resource Feedback Parameter Beta

Updated March 17, 2015

As a Force.com developer or architect, you’ve probably seen plenty of content on Salesforce Developers recommending that you use selective filters in your SOQL, reports, and list views. Until now, you’ve been able to use some technical resources to learn how the Force.com query optimizer generally drives queries, but you haven’t been able to receive real-time feedback about how the Force.com query optimizer will drive your specific queries. Instead, you probably had to manually maintain record counts to ensure that your filter conditions didn’t exceed the Force.com query optimizer’s thresholds, and run and re-run your queries to get the results that you wanted.

Here’s the exciting news: In the Spring '14 release, you can take advantage of the Query Resource Feedback Parameter beta, which enables you to get information about how the Force.com query optimizer will drive your query—even if you don’t actually run that query. Previously, you had to actually run that query by using the q parameter. In this beta, you can make a REST API call using the explain parameter to see how Salesforce will process your query. The Query Resource Feedback Parameter feature is enabled by default for all organizations that can already make API calls, and you can find its documentation in the Force.com REST API Developer’s Guide.

Update: In the Spring '15 release, you can also enter report and list view Ids into the Query Resource Feedback Parameter to test the selectivity of reports and listviews. An additional feature is a set of notes from the query optimizer including whether a field is un-indexed or if an un-optimizable operator is being used. The examples below have been updated to include the Notes data.

As you develop your SOQL queries, reports and listviews, you can use this parameter to ensure that your filters are selective, helping you minimize your query response times and optimize the database’s overall performance.

Table of Contents

What Does the Query Resource Feedback Parameter Return?

To enable you to enter a query, and see a list of the indexes and tables that the Force.com query optimizer is considering using to drive that query, this article uses a simple Visualforce page and Apex controller. The list orders the possible query execution plans from most optimal to least optimal, and the first row in that list shows what the Force.com query optimizer would use to drive your query if you actually ran it. The list is in JSON and takes the following form.

{
"plans" : [ {
   "cardinality" : 1,
    "fields" : [ "CreatedDate" ],
    "leadingOperationType" : "Index",
    "notes" : [ {
       "description" : "Not considering filter for optimization because unindexed",
       "fields" : [ "IsDeleted" ],
       "tableEnumOrId" : "Merchandise__c"
    } ],
    "relativeCost" : 0.0,
    "sobjectCardinality" : 3,
    "sobjectType" : "Merchandise__c"
 }, {
    "cardinality" : 1,
    "fields" : [ ],
    "leadingOperationType" : "TableScan",
    "notes" : [ {
       "description" : "Not considering filter for optimization because unindexed",
       "fields" : [ "IsDeleted" ],
       "tableEnumOrId" : "Merchandise__c"
    } ],
    "relativeCost" : 0.65,
    "sobjectCardinality" : 3,
    "sobjectType" : "Merchandise__c"
}] } 

The fields are defined as follows.

cardinality

The estimated number of records that the leading operation type would return. For example, the number of records returned by an index.

fields

The indexed field(s) used by the query. If the leading operation type is Index, the fields value is Index. Otherwise, the fields value is null.

leadingOperationType

The primary operation type that Salesforce will use to optimize the query.
  • Index - The query will use an index on the query object.
  • Other - The query will use optimizations internal to Salesforce.
  • Sharing - The query will use an index based on the sharing rules associated with the user who is executing the query. If there are sharing rules that limit which records that user can access, Salesforce can use those rules to optimize the query.
  • TableScan - The query will scan all records for the query object.

notes

An array of one or more feedback notes. Each note contains:
  • description - A detailed description of an aspect of the optimization. This could include information on optimizations that could not be used, with details on why the optimization was not used.
  • fields - An array of one or more fields used for the optimization.
  • tableEnumOrId - The table name for the fields used for the optimization.
This response field is available in API version 33.0 and later.

relativeCost

The cost of the query compared to the Force.com query optimizer’s selectivity threshold. Values above 1 mean that the query won’t be selective.

sobjectCardinality

The approximate record count for the query object.

sobjectType

The name of the query object.

As the previous definitions show, both the cardinality and sobjectCardinality values are approximations. You will see later that there are times when the cardinality value is lower than you expect it to be. This discrepancy is mainly due to pre-queries that the Force.com query optimizer uses to determine selectivity, and a common example of this discrepancy occurs when a cardinality value matches a selectivity threshold—the Force.com query optimizer stops determining cardinality if its record total would exceed that threshold. In effect, relying on the cardinality of a query is not a reliable alternative to executing COUNT() queries, especially for large data sets, and is generally only useful in the context of query optimization. sObject cardinality is a closer estimate of record totals and isn’t affected by pre-queries, and sObject cardinality record counts are identical to the record counts that appear on the Storage Usage page.

Salesforce.com is exposing a lot of information about the Force.com query optimizer's decisions, but salesforce.com's online documentation, blog posts, and articles won't explain those decisions comprehensively. These resources are intended to provide enough details to enable you to make informed decisions about how to write optimal SOQL queries. For example, here's one good fact to keep in mind: The Force.com query optimizer's cardinality values don't factor in the records in record totals that exceed the appropriate selectivity thresholds. Remember that the optimizer is trying to generate the most efficient query, and in that context, counting beyond the selectivity threshold does nothing to help performance or determine selectivity. Through the following examples, you’ll learn how the optimizer determines a relative cost, and chooses the index or table with the lowest relative cost value.

The goal of this feature is to expose the index or table that would drive a query. It is not to provide a complete understanding of the Force.com query optimizer. You can, however, validate that an index was chosen as expected.

Adding a Selective Filter

To learn about how the explain parameter can help you determine selectivity, read through some examples that use a custom object called "MyCase," which has a Status picklist field (New, In Progress, Closed) and a ClosedDate field. Of the 10,000 MyCase records, 9,260 are Closed, 666 are In Progress, and 74 are New. When testing your queries with the Resource Query Feedback Parameter feature, try to test your queries against at least 10,000 records. When testing your queries with the Resource Query Feedback Parameter feature, try to test them against objects that have at least 10,000 records. If the objects that you're querying against have fewer than 10,000 records, the Force.com query optimizer might not consider all of the optimization options that it would consider for objects that have large data volumes.

For your query’s filters to be selective, they must:

See Designing Dashboards and Reports for Force.com Implementations with Large Data Volumes.

If you haven't already done so, make sure to add the URL for your instance as a remote site. From Setup, click Security Controls | Remote Site Settings.

Start with a sample query to illustrate just how important providing well-formed filter criteria is to making an efficient query. The MyCase object has a Parent_MyCase__c field, mimicking the Parent Case field on the case standard object. A well-formed query finding the children of a specific case—if that case has any children—should be very efficient, as lookup fields always have a standard index applied to them.

SELECT id FROM MyCase__c WHERE Parent_MyCase__c = 'a02C000000k4hmK'
Sample query demonstrating how important well-formed filter criteria are

The best option, hitting the standard index, has a relative cost of 0.001, which is about as low as you can get! If you change the Equals operator to IN for a set of records, you see similarly low costs; adding 3, 5, or 10 IDs here still has a relative cost of 0.001 in this sample data set. Next, see what happens when you make this scenario a little less perfect.

A very common selectivity issue for Apex developers with queries like this one involves collecting IDs and adding them to queries—and accidentally including a null value in that collection. See what happens when you add a null value in.

SELECT id FROM MyCase__c WHERE Parent_MyCase__c IN ('a02C000000k4hmK', 'a00C000000Os5de', 'a00C000000Os5dw', null)
Sample query demonstrating what happens when you add a null in

The pesky little null that snuck in completely changed the query’s performance, increasing the cost by 753 times because the index wasn't useful anymore—the null made the query's filter condition unselective.

In the previous two examples, the Notes section stated some fields weren't chosen to drive the query because they were unindexed or the value of the right operand was Null. In both examples, the Notes provided state that the IsDeleted field isn't indexed even though the queries did not contain this field. This occurs due to internal optimizations considered by the query optimizer. The only time to include the IsDeleted field in any query occurs in conjunction with the use of the ALL ROWS clause. However, the use of the ALL ROWS clause should be limited as it queries deleted and archived records and will likely perform poorer.

In the second example, the Parent_MyCase__c lookup field was not chosen to drive the query because standard indexes of lookups do not include Null values.

Next, continue to another commonplace example, which is seen in list views and reports the world over: queries that return all cases that have a status of New or In Progress.

SELECT Id FROM MyCase__c WHERE Status__c IN ('New', 'In Progress'	)
Sample query that returns all cases that have a status of New or In Progress

As you can see from the results of the REST API call, the Force.com query optimizer chose to use a table scan of the MyCase object. There weren't any other leading operation types because Status__c doesn’t have a custom index applied to it, which is confirmed in the Notes section. Checking for unindexed fields and un-optimizable operators are the main benefits of the Notes section.

Assume that you now have a custom index on Status__c, and that you have retried your explain request.

Sample query that retries the explain request

Here, you see that the Force.com query optimizer chose to use the index on Status__c to drive the query. Both TableScan and Index appear in the list of leading operation types, and both have a relative cost value that's below 1, so both choices meet the selective threshold. However, because the index has a slightly lower cost, using an index is the best option.

What if you wanted to find all closed cases? Because you have a lot more cases with the Closed status than you do cases with any other status, would a filter for Closed use the index?

Sample query containing a filter for closed cases

You can see that, even with the presence of the index, the Force.com query optimizer chose a table scan. The index matched 9,620 of 10,000 records, putting the number of targeted records well above the "10% of total" custom index selectivity threshold. Both choices have a relative cost that’s above 1. If most of your query and report filters had been trying to filter on the 'Closed' value, this Status__c index wouldn't be helpful.

One of the lesser-known query optimizations is sharing based and enables users who can access only a few records to perform queries would otherwise be highly inefficient. Consider this particularly non-scalable example, which features no filters at all: SELECT id FROM MyCase__c. This query has a relative cost of over 2.5 for our administrator. What do you think the relative cost would be for a Read Only user who has access to 12 MyCase records?

Sample query for a ReadOnly user who has access to twelve MyCase records

The sharing-based optimization here is so efficient that the relative cost is lost by rounding to the thousandths place! It enables you to make some complex customizations, such as Apex managed sharing for Communities users, without needing to translate your sharing into SOQL filters or manually join to the MyCase__Share table. Without this optimization, list views such as "All open cases," which would enable all Communities users to view all of their company's cases, could be incredibly slow or complex.

You can also create two-column indexes to help optimize your query performance. In this scenario, the MyCase object includes a Product__c field and a Customer__c field. For simplicity's sake, assume that they're both text fields, and that Product__c has been made an external ID. Also assume that you've created a two-column index with Product as the first column and Customer as the second.

Look at a query that references both the Product__c field and the Customer__c field.

SELECT Id From MyCase__c WHERE Product__c = 'Product 1' and Customer__c = 'Customer 1'
Sample query that references both the Product__c field and the Customer__c field

You can see that the Force.com query optimizer chose the two-column index—both columns are displayed in the Fields column. It has a relative cost that was lower than doing a table scan. Note that the cardinality for the table scan was only 1. This number doesn’t necessarily mean that only one column matched the filter condition. As described earlier, some of the statistics returned from pre-queries might not always be an accurate representation of the true record count. You may also notice that the Notes section, highlights that the Customer__c field is unindexed but this isn’t an issue as it is the second column in the two-column index.

Avoid Negative Operators

Remember that one of the key tenets for a selective filter was to avoid inefficient operators, including negative operators like '!=' or NOT CONTAINS.

What happens if you change your query so that it uses a negative operator?

SELECT Id FROM MyCase__c WHERE Status__c != 'Closed'
Sample query that uses a negative operator

As you can see, the index on Status__c is no longer an option and the Notes section highlights that the negative operator is un-optimizable. The Force.com query optimizer knows that it would be more expensive to use the index than the table scan.

Avoid Inefficient Operators

Leading '%' wildcards are inefficient operators that also make filter conditions unselective. As a best practice, use SOQL or remove the leading '%'; although the Force.com query optimizer does consider using an index even if you do use this leading wildcard, it will choose to use that index based on the number of characters in the search string and the number of matches that result. For these reasons, it usually chooses to do a table scan. When you're querying millions of records, the performance that a selective index gives you can be significantly better.

SELECT Id FROM MyCase__c WHERE Status__c LIKE '%e%'
Sample query that uses a leading wildcard

Again, even though this is an inefficient query, the Force.com query optimizer still considers using the Index. Remember that this type of query will usually prompt the Force.com query optimizer to do a table scan. In this simple example, where you have only a few Status values, you can search for '%n%' and still have a cardinality value and a relative cost that make the Force.com query optimizer use an index. In the real world, the Force.com query optimizer is more likely to do a full scan when you use leading '%' wildcards.

Composite Index Joins

Assume that you also created a custom index on the ClosedDate__c field. Look at what happens if you add a filter on this field to your query.

SELECT Id FROM MyCase__c WHERE Status__c = 'Closed' AND ClosedDate__c = 2014-01-01
Sample query containing a filter on the ClosedDate__c field

The Force.com query optimizer chose the index on ClosedDate__c as the most efficient index. Notice that it also considered a TableScan and the index that's on Status__c. There was also another choice labeled Other, which again can represent several types of internal operations.

The Force.com query optimizer's job is to analyze possible leading operation types and pick the one with the lowest cost. In this example, Other was a composite index join. Here, the optimizer looked at the intersection of both the Status__c and ClosedDate__c indexes.

Unions

What if you had used an OR condition instead of an AND condition?

Sample query that uses an OR condition instead of an AND condition

Here, the Other value represents a union of the two indexes. For this type of optimization, each filter condition must be indexed, and the union must be under the selectivity threshold. In this case, the union of the 9,260 closed cases with the three cases that were closed on 2014-01-01, doesn’t meet the selectivity threshold. Notice that cardinality shows only 9,260 records instead of 9,263. Once the Force.com query optimizer realized that the first part of the union would not meet the threshold, it didn’t need to consider the second filter of the union.

Next, look at an example that does meet the selectivity threshold. Suppose you wanted to get cases with a status of In Progress or for Product 2.

SELECT Id FROM MyCase__c WHERE Status__c = 'In Progress' OR Product__c = 'Product 2'
Sample query that gets cases with a status of In Progress or for Product 2

Because you have 666 cases with a status of In Progress and 80 with Product = 'Product 2', the union is under the selectivity threshold.

Sort Optimization

Another common optimization that the Force.com query optimizer considers is called sort optimization. Sort optimization is used with date and number fields, and requires an ORDER BY with a LIMIT clause. In cases where the selectivity threshold can’t be met, sort optimization might be better than a table scan. Sort optimization was previously discussed in this blog post.

SELECT Id FROM MyCase__c ORDER BY CreatedDate LIMIT 100
Sample query that uses sort optimization

In this case, the index is stored in a sorted manner, and the LIMIT clause enables the Force.com query optimizer to quickly scan the sorted data and stop at the nth record without scanning the entire dataset.

For standard indexed fields like CreatedDate, you need only the ORDER BY with a LIMIT clause. Try using it with a custom index like ClosedDate.

SELECT Id FROM MyCase__c ORDER BY ClosedDate__c LIMIT 100
Sample query that uses ORDER BY with a custom index

Here, you can see that the Force.com query optimizer chose a table scan; the relative cost is greater than 1. Again, remember that relative cost values that are greater than 1 are not selective. Now, see what happens when you add an ORDER BY with a LIMIT clause. You also must add a filter against nulls; sort optimization currently does not always support nulls.

SELECT Id FROM MyCase__c WHERE ClosedDate__c != NULL ORDER BY ClosedDate__c LIMIT 100
Sample query that uses ORDER BY with a LIMIT clause

In this case, the Force.com query optimizer chose to do a sort optimization on the ClosedDate index—and have that index drive the query. Notice that the relative cost associated with this leading operation type is 0.1, and that the cardinality value is the same as the LIMIT value.

Another form of sort optimization occurs with two-column indexes. For this example, assume that you have a two-column index on ServiceLevel and LastCommentDate. Suppose that you want to get 100 cases that were updated since the beginning of last year with a Silver service level.

SELECT Id FROM MyCase__c WHERE ServiceLevel__c = 'Silver' AND LastCommentDate__c > 2013-01-01 LIMIT 100
Sample query that gets 100 cases that were updated since the beginning of last year with a Silver level

You might expect the LIMIT clause to use the index because you’re asking for only 100 records, but in actuality, the index was the third and most expensive option. To use this option, the Force.com query optimizer would need to scan the index and look at the specific values before stopping when it finds 100 records. Notice the first leading operation type was Other. For its multi-tenant environment, salesforce.com maintains its own statistics and provides hints about those statistics to the underlying database. When it receives these hints, the database can make more informed decisions about how to drive your queries. However, there are also times when it's more efficient to let the underlying database implementation alone determine which indexes, if any, to use. This is the method that the Force.com query optimizer chose in this example.

You might get better results by using the custom two-column index to apply the sort optimization technique. In this example, the first column of the index must appear in the WHERE clause, and sorting occurs on the second column.

SELECT Id FROM MyCase__c WHERE ServiceLevel__c = 'Silver' ORDER BY LastCommentDate__c Desc LIMIT 100
Sample query that uses a custom two-column index to apply the sort optimization technique

Here, you can see that the relative cost is lower for this form of sort optimization than it was for the previous sample query.

Sort optimization might help you improve your query performance when the filters in your query aren’t selective. It works best when you are accessing the ends of the index and don’t need to check against other filter conditions. Removing the filters and just having an ORDER BY and LIMIT without the unselective filters might perform better. Otherwise, the query might need to scan the entire index to meet the filter conditions until the limit is reached.

Summary

Although this article isn't intended to provide comprehensive interpretations of every choice that the Force.com query optimizer can make, it should have given you an idea of how to use the explain parameter in conjunction with a REST API call to gather valuable query optimization information. As you have learned, using that information to make even slight changes to your queries can have a big impact on your query performance.

Here are a few big takeaways to remember. In general, if you add a selective filter to your query, that query should perform better than it otherwise would. As your record totals grow into the millions, using the explain parameter can help you gather the data that you need to improve your query performance. Finally, when you test your queries, be sure to test them against objects that have at least 10,000 records—otherwise, the Force.com query optimizer won't consider all of the operation types that it would consider for objects that have large data volumes.

Note: There is a known issue that occurs when your query has a single filter that is unselective, and a relative cost that is between 1.000 and 1.300. This is a corner case where the Force.com query optimizer chooses between two unselective options based on a complex algorithm. Because both unselective options will have similar relative costs, this sorting issue doesn’t have much of any impact on query performance.

For example, in this query, the Index option is listed as the most optimal leading operation type, but the Force.com query optimizer actually chooses the TableScan.

Sample query that demonstrates the known sorting issue

As a best practice, update your queries to bring the relative cost of the table scan below 1.000. Adding an additional filter can usually affect the relative cost enough to bring it below that value.

Related Resources

About the Authors

John Tan is an Architect Evangelist within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound salesforce.com solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Salesforce Developers.

Chris Peterson is a Force.com MVP and a Senior Developer with FinancialForce.com for its Professional Services Automation application. He has been building ISV applications on the Force.com platform since 2009. He blogs somewhat regularly at http://www.ca-peterson.com.