SOQL for the SQL Developer

This blog post helps explain SOQL, or the Salesforce Object Query Language, to developers familiar with SQL. SQL is the granddaddy of query languages, and it’s no surprise why. The relational database remains the de facto data storage workhorse despite the rise of many alternatives. Almost every developer has a working knowledge of SQL and the basics of communication with an RDBMS. This makes it easy to adopt SOQL. After you read this post, you’ll understand a few basic tenets of how SOQL works and why Salesforce even bothered to create its own SQL-like language.

Trust, multitenancy, SOQL, and the query optimizer

Multitenant architecture is fundamental to the Salesforce platform. Without an understanding of this, you will puzzle at certain behaviors and features. The multitenant architecture white paper is a must read. There’s also a Trailhead unit on our architecture that’s worth a few minutes of your time.

At the database layer, multitenancy means that there is a single set of tables shared by many different customers. This is the “real” data model in our data centers. It’s complex and consequently requires highly complex joins to get the right data to the right customer and user for any given request. Consequently, we abstract the “real” data model and materialize this into a customer-facing and customer-specific data model in the form of SObjects (Salesforce Objects).

Trust is a core value that Salesforce lives and breathes. Trust in our architecture means customers all over the world store sensitive critical business data in our database. Apart from the issue of query complexity, it can be difficult to allow direct SQL access to a shared database and still maintain trust. To solve this problem, along with the customer-facing data model, Salesforce provides a query API at the application tier: SOQL.

When a developer invokes SOQL, it’s the responsibility of a technology called the query optimizer to translate it into SQL. The query optimizer dynamically adapts its query to select and join tables in the most efficient way possible depending on the filters, fields, and other properties involved in the query. It also enforces sharing and record access rules to guarantee data privacy. Finally, it ensures the underlying SQL query only retrieves records that belong to the customer environment that issued the SOQL request.

To summarize, the query optimizer does the hard work to maintain trust, to simplify access to data, and to surface the customer-facing data model from the underlying multitenant “real” data model. SOQL, in turn, is the set of instructions you use to harness the power of the query optimizer.

Data model, SObjects, and SOQL

So we have a customer-facing data model in the form of SObjects and SOQL, the API to talk to that data model. Let’s go through some examples to understand how the data model abstraction impacts the query language. For these examples, I use some SObjects from the Dreamhouse App demo. This example app showcases many technical features of the Salesforce platform in the use case of real estate and property management. I will use the Property__c and Broker__c SObjects representing homes and their sellers respectively. These are custom objects; note that once you create a custom object, it’s automatically surfaced to all APIs, including SOQL, without any effort on the part of a developer.

Unlocking the Code of SObject Names, Field Names, Suffixes

The schema of a given org is a blend of standard SObjects provided by Salesforce, so-called Standard Objects, and custom SObjects. The way we create the names for these SObjects is a little different, but once you understand them, it’s straightforward.

Every object and field has a default Label, a Name, and an API Name. When it comes to working with these things in code, it’s the API name that matters. It’s the unique human-readable identifier for a given item. Labels, of course, are what the user sees, and can change at run time depending on the language of the user. The Name is used when defining a custom object or field, and is used by the system along with a standard suffix to derive the API name.

Label: User-friendly translatable field label
Name: Developer-/Admin-defined internal field name in custom objects
API Name: The canonical, unique name

In custom objects, the API name is derived from the Name and a suffix, which I’ll describe next. In standard objects, the Label and API name are almost always the same, there’s no separate Name, and there’s no suffix.

API name suffixes in turn identify the nature of a field or an object. Custom objects are suffixed by “__c” as are most custom fields. If you see a field with an “__r” suffix, this indicates a reference relationship to another object that can be used for querying using the implicit join of the relationship field. This creates a kind of object relational mapping that can be used directly in queries as we see in the next section.

Object relationship reference notation

As I mentioned __c is the most common suffix, but there are others. While I’m not aware of an exhaustive list of all field suffixes in one place, this answer on Salesforce StackExchange has become my de facto source for remembering some of the more obscure ones. Here are a few everyone is likely to come across:

__x: External Object (the data source is not the Salesforce multitenant database, but SOQL is still supported)
__Share: The sharing object for a custom object
__Feed: The chatter feed for a custom object
__History: Field history tracking for custom objects
__latitude__s/__longitude__s: Geolocation Latitude/Longitude coordinate fields

Basic SOQL

When speaking to developers new to Salesforce, but with some SQL experience, this example probably requires no explanation.

Joins and relationships are where things get interesting.

Joins and Foreign Keys Versus Relationship Fields

While the Salesforce object model and relational databases are both fundamentally relational in nature, both the metadata and the runtime expression of these are very different.

When using a relational database, you can define a foreign key constraint between two tables. However, when retrieving data, joins are expressed in the way that you write your SQL, and don’t have to rely on an actual foreign key constraint.

In Salesforce, relationship fields are essentially foreign keys, they also express constraints between two SObjects. However, there is no way to join at runtime unless the relationship field already exists. In this vein, the developer might want to think of the relationship field as both the constraint and the join.

Because of this difference, querying across two SObjects is expressed quite differently in SOQL. These are called relationship queries, and they’re quite expressive.

Here are some examples using JSON-like pseudocode as a representation of data relationships in the results.

If you want more reading on relationship queries, be sure to look at this blog post and of course the documentation on relationship queries.

Aggregate, distance, and So Much More

In addition to basic queries and joins, there exist a host of additional features for leveraging SOQL to preprocess data before it gets to the requestor.

Aggregates give the developer the ability to count, min, max, and more to provide summaries of data.

Salesforce has a composite field type called a geolocation field, which stores geocoordinates. SOQL has built-in features to harness and incorporate geocoordinates data into queries. Distance from a point can be used both in the WHERE and the ORDER BY clauses of SOQL.

There are many more features. Some of these include, polymorphic relationship fields, more complex aggregations with GROUP BY clauses, and pagination with OFFSET.

Limits: How To Not Shoot Yourself In The Foot

We live and breathe multitenancy. In such an environment, thousands of Salesforce customers share a single set of run-time resources, app servers, databases, and other computing resources. It’s essential to isolate bugs and performance problems so that a mistake made by a developer in one tenant never has a negative impact on another. For this reason, Salesforce implements specific limits in SOQL. In other words, you never pay the price for another developer’s bad code (because I know you never write bad code.)

The first limit is that SELECT * is not allowed. Every query must explicitly select each field that should be returned. I’ll be honest…this annoys a lot of developers. But if you think about it, how often do you really need to select every field at runtime? This prevents such bad practices from ever making it into production.

The next is that SOQL is a read-only language. Don’t worry, APIs exist to change data, such as inserts, updates, and deletes. Fundamentally, if you’re writing a statement that invokes SOQL, that code never modifies data. That’s not to say that SOQL injection is impossible, but it does greatly mitigate the vulnerability.

Finally, the number of records that can be retrieved is also limited, depending on where you’re issuing the SOQL query from. In normal, synchronous processing, you’re limited to 50,000 records retrieved across all queries in a transaction as well as a maximum of 100 queries issued. When retrieving data via data APIs you are limited to a maximum of 2,000 records per server round-trip. There are ways to efficiently process more records, such as using asynchronous Apex on the platform, or the Bulk API. But the platform is strongly opinionated to prevent common performance problems, such as too many queries and bloated database requests that might leak over to other tenants.

Invoking SOQL

So where can you use SOQL? There are two common places where SOQL is invoked: Apex, and from the API.

Apex

Apex is the native programming language to execute logic on Salesforce servers. We created it for reasons similar to why we created SOQL: to have a layer of abstraction between the programming language and the actual runtime environment. Given that Salesforce runs on Java, it isn’t entirely wrong to say that Apex is actually a JVM language. In Apex, there are two ways to invoke SOQL: inline SOQL statements, and the Database.query()method. They both result in the execution of a SOQL query, but are slightly different.

Inline SOQL requires you to hard code the entirety of your query, except for the right operands of any WHERE clause. This has the advantage of the query being parsed by the compiler and checked against field references. On the other hand, when invoking Database.query(), you pass the query in the form of a string, meaning any part of any clause can be made dynamic.

API

Each Salesforce API that supports SOQL surfaces some mechanism for the developer to pass it a query. For instance, when invoking SOQL from the REST API, you invoke a GET request against the /query endpoint, pass the query to the q parameter. The SOAP API surfaces a query() method in its WSDL. The Streaming API PushTopic object has a field that uses a subset of SOQL to define the data context for notifications.

Below is an example of a query invoked via a REST API GET request.

Note that REST and SOAP APIs also contain ways to fetch a single record when you have a known record ID (you can see it in the response above in the url attribute). This is often preferable and more efficient than an ad hoc SOQL query for a single record. It also only returns fields that the given user has access to.

SOQL vs SOSL

There is another query language for the platform, called SOSL, or Salesforce Object Search Language. This is not the article to crack that open. SOSL is for fuzzy full-text searches. SOQL is for well-structured deterministic searches. For now, just tuck away this thought for later: If you use wild cards to search with SOQL, and your search is slow, consider SOSL as an alternative.

SOQL your way to success

As you can see, SOQL is quite straightforward, and should be easy to developers who know SQL to adopt. Wherever a developer needs a well-structured query in their Salesforce code or integration, SOQL is the way. It automatically surfaces changes in your data model. It contains built-in limits to mitigate common performance problems, and it (mostly) looks and feels like the SQL you already know. If you consider these few distinctions, you can be up and running quickly and use the power of SOQL with very little effort.

If you want to learn more about SOQL and interacting with the Salesforce multitenant database, go to Trailhead and try the Database & .NET Basics module. If you are not a Microsoft developer don’t be put off by the name, it’s a module for everyone. And if you’d like to read more about the query optimizer, there is a knowledge article in Salesforce Help that will give you more than you ever hoped to read on the subject.