Collecting Force.com selectivity statisticsWhen it comes right down to it, building efficient SOQL queries, reports, and list views depends on selectivity and indexes, especially when your objects have some serious record counts. A previous Force.com Blogs post, webinar, and the Database Query & Search Optimization Cheat Sheet are terrific resources for learning how the Force.com query optimizer makes decisions, including information about selectivity thresholds for filter conditions.

But how can you put this knowledge into practice? This post gives you some simple, practical steps for gauging the selectivity of your filter conditions–before you actually use those conditions in your queries.

What’s the Selectivity of Your Filter Condition?

Suppose you are building a SOQL query, report, or list view for one of the largest objects in your org, the Opportunity object. You have a filter condition (i.e., WHERE clause) that fetches just the rows you want from the object. The $64 question is this.

Is my filter condition selective enough for the Force.com query optimizer to use an available index?

With a simple SOQL query, you can quickly get the statistics you need to figure out whether a given value in a given field is selective.

Use SOQL to Determine the Selectivity of a Filter Condition

Consider a query that uses a basic, unary WHERE clause condition.

SELECT Id, Name FROM Opportunity
 WHERE Stagename = 'Closed Won'

 

Using the Force.com ad hoc query tool of your choice (e.g., the Developer Console Query Editor, Workbench), execute the following query to get some statistics related to your filter condition’s selectivity.

SELECT Stagename, COUNT(id) FROM Opportunity
 GROUP BY ROLLUP(Stagename)

 

Note: If the above query times out because your object is extremely large, work with salesforce.com Customer Support to carry out this step.

The result set shows you the distribution of records for each value of the StageName picklist field. The last row shows you the total number of records in the object. For example:

StageName Count
1 Prospecting 50177
2 Qualification 49755
3 Needs Analysis 49965
4 Value Proposition 50135
5 Id. Decision Makers 50012
6 Perception Analysis 49840
7 Proposal/Price Quote 49943
8 Negotiation/Review 50158
9 Closed Won 49899
10 Closed Lost 50116
11 500000

 

With this information, you have all the statistics that you need to determine the selectivity for a filter condition involving this specific field. Turning your attention to the cheat sheet, you can see that unary filter conditions involving standard fields are selective when they target less than 30% of the first one million records. In this example, the object has fewer than one million records, so the selectivity threshold is simply 30% of 500,000, or 150,000. By extrapolating this to the query of interest, you can see that there are 49,899 records with StageName = ‘Closed Won’. In this case, the filter condition is selective (49,899 is smaller than 150,000).

Determine the Selectivity of More Complex Filter Conditions

@JohnTanSFDC’s post gives you some great examples of more complex filter conditions that are selective (and nonselective). It’s not necessary to repeat the same information here, but I would like to add that using GROUP BY ROLLUP queries similar to the one above makes it easy to get the statistics you need to evaluate the selectivity of various conditions.

Here’s a quick example of a query with a more complex filter condition that uses a date field along with the AND operator.

SELECT Id, Name FROM Opportunity
 WHERE Stagename = 'Closed Won'
   AND CloseDate = THIS_WEEK

 

You already know the statistics for the Stagename field from your previous work. Now get the same stats for the CloseDate field, grouped by week for each year. Fortunately, SOQL has some nifty date functions that make getting this data easy.

SELECT WEEK_IN_YEAR(CloseDate), CALENDAR_YEAR(CloseDate), COUNT(id)
 FROM opportunity
 GROUP BY ROLLUP(WEEK_IN_YEAR(CloseDate),CALENDAR_YEAR(CloseDate))
 ORDER BY CALENDAR_YEAR(CloseDate), WEEK_IN_YEAR(CloseDate)

 

The output of the query above returns statistics about the distribution of Opportunity records across every week of every year by CloseDate. For example:

Week Year Count
1 2012 3131
2 2012 3095
3 2012 3344
4 2012 3220
5 2012 3197
6 2012 3185
.
1 2013 3151
2 2013 3395
3 2013 3374
4 2013 3220
5 2013 3593
6 2013 3286
.

 

Back to the Cheat Sheet, you see that, for filter conditions that AND two or more conditions, the Force.com query optimizer considers the overall filter condition selective when the filter targets less than:

  • Twice the selectivity thresholds for each filter
  • The selectivity thresholds for the intersection of those fields

For the current example, this means:

  • Status = ‘Closed Won’ is selective (49,899 < 150,000)
  • CloseDate = THIS_WEEK is selective (~3000 < 150,000)
  • Overall, the filter condition is selective, for both reasons listed above

To make things more challenging, consider what happens when one of the filter conditions is nonselective (e.g., Status=’Closed Won’ corresponds to 250,000 records). Two possibilities would make the overall filter condition selective.

  • Each filter condition corresponds to less than 300,000 records (twice the selectivity thresholds for each filter)
  • The intersection of Status=’Closed Won’ AND CloseDate = THIS_WEEK is less than 150,000 records.

The former is true in our hypothetical situation, so the overall condition is indeed selective.

Note: With the OR operator, each filter must meet the threshold individually.

Don’t Forget About Deleted Records

@JohnTanSFDC’s post also does a great job of explaining that deleted records can have a significant effect on query performance. So how do you incorporate (or exclude) deleted records when gathering selectivity statistics? It’s simple: Use the Boolean field, IsDeleted, available in every standard and custom object.

The previous stats-gathering query collects data for all opportunity records, because of the ROLLUP function, regardless of whether IsDeleted is true or false. When you want to collect the selectivity statistics for Opportunity.StageName and explicitly exclude deleted records, try something like the following query.

SELECT Stagename, COUNT(id) FROM opportunity WHERE IsDeleted=false GROUP BY Stagename

 

Note: If you are using Workbench, it’s even easier. Just select Exclude for the “Deleted and archived records” option list.

Last But Not Least, Make Sure You Have Indexes

Let’s say you do your due diligence to confirm that a given filter condition is selective before putting a query, report, or list view into production. All of your tuning work is for nothing when the fields that make the condition selective do not have indexes. Without the necessary indexes in place, Force.com must perform a full scan to fetch the target rows.

The Database Query & Search Optimization Cheat Sheet lists the standard fields that have an index by default, including Id, Name, OwnerId, CreatedDate, SystemModstamp, and RecordType, as well as all master-detail and lookup fields.

But what if your filter condition involves a custom field? Unfortunately, the Force.com UI doesn’t currently provide you with a list of custom indexes. That means you’ll have to work with Customer Support to understand if they can create a custom index on the field your filter uses. And not all fields can have an index, such as non-deterministic formula fields.

When you think that a selective field doesn’t have an index, log a case with salesforce.com Customer Support and ask them to put one in place for you. Once available, the index can help make your query execute faster and improve the productivity of your organization’s users.

Summary

Once you understand how the Force.com query optimizer determines filter condition selectivity for a SOQL query, report, or list view, get ready to do some investigative work. With some simple SOQL queries, you can easily get the statistics that you need to determine whether a specific filter condition is selective. Once you index a selective field, your queries with related filter conditions can execute more efficiently and your users can be more productive.

Related Resources

About the Author

Steve Bobrowski is an Architect Evangelist within the Technical Enablement team of the salesforce.com Customer-Centric Engineering group. The team’s mission is to help customers understand how to implement technically sound Salesforce solutions. Check out all of the resources that this team maintains on the Architect Core Resources page of Developer Force.

tagged , , Bookmark the permalink. Trackbacks are closed, but you can post a comment.
  • Digamber Prasad

    Great blog!

  • mailtoharshit

    Thanks for sharing

  • ram_sj

    Another good post, For relationship queries like child-parent or parent-child, I assume the selectivity threshold is ignored even if one of the object don’t obey the above mentioned where clause policy?

  • Mayank Joshi

    loved this WHERE IsDeleted=false !!