Window Functions and Queries
Applies to: ✅ Data Cloud SQL ✅ Tableau Hyper API
Window functions provide the ability to perform calculations across sets of rows that are related to the current query row. This section introduces window queries and a reference for the window functions supported in Data Cloud SQL.
A window function does a calculation across a set of table rows that are related to the current row. Rows in window function calculation remain distinct in the output.
Here's an example that shows how to compare each employee's salary with the average salary in their department:
The first three output columns come directly from the table empsalary
, and there's one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname
value as the current row.
A window function call always contains an OVER
clause directly after the window function's name and arguments. The OVER
clause defines how the rows of the query are split up for processing by the window function. The PARTITION BY
clause within OVER
divides the rows into groups that share the same values of the PARTITION BY
expressions. For each row, the window function is computed across rows that fall into the same partition as the current row.
You can also control the order in which rows are processed by window functions using ORDER BY
within OVER
. Here's an example:
In this example, the rank
function produces a numerical rank for each unique ORDER BY
value in the current row's partition, using the order defined by the ORDER BY
clause.The OVER
clause defines the behavior of rank
.
You can specify ORDER BY
independently top-level query ORDER BY
output.
The query's FROM
clause filtered by its WHERE
, GROUP BY
, and HAVING
clauses determine the rows computed in the window function. For example, a row removed because it doesn't meet the WHERE
condition isn't seen by any window functions. A query can have many window functions that divide the data in different ways using different OVER clauses. But they all work on the same group of rows defined by the virtual table.
It's also possible to omit PARTITION BY
, in which case there's a single partition containing all rows.
A window frame is a set of rows within the defined partition. Some window functions act only on the rows of the window frame, rather than on the whole partition. If ORDER BY is given, the frame consists of all rows from the start of the partition up to the current row. It also includes any rows that are equal to the current row. If you don’t include ORDER BY
, the default frame includes of all rows in the partition. Here's an example using sum
:
The SELECT
statement includes all rows in the table because there is no ORDER BY or PARTITION BY
clauses. Each sum is taken over the whole table, and so we get the same result for each output row. But if we add an ORDER BY
clause, we get different results:
Here the sum is taken from the first (lowest) salary up through the current one, including any duplicates of the current one.
You can only include window functions in the SELECT
list and the ORDER BY
clause of the query. Also, window functions execute after non-window aggregate functions. This means you can include an aggregate function call in the arguments of a window function. But you can’t include a window function in an aggregate function call.
To filter or group rows after the window calculations are performed, you can use a sub-select. For example:
The preceding query only shows the rows from the inner query having rank
of less than three.
If a query has many window functions, name each windowing behavior in a WINDOW
clause and then reference it in OVER
. For example:
For more details about WINDOW
clauses, see SELECT.
The built-in window functions are listed in the next table. Invoke these functions using window function syntax.
Function | Return Type | Description |
---|---|---|
row_number() | bigint | number of the current row within its partition, counting from 1 |
rank() | bigint | rank of the current row with gaps; same as row_number of its first peer |
modified_rank() | bigint | rank of the current row with gaps, but taking the lowest rank in case of ties; same as row_number of its last peer |
dense_rank() | bigint | rank of the current row without gaps; this function counts peer groups |
percent_rank() | double precision | relative rank of the current row: (rank - 1) / (total partition rows - 1) |
cume_dist() | double precision | cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows |
ntile(num_buckets integer) | integer | integer ranging from 1 to the argument value, dividing the partition as equally as possible |
lag(value anyelement [, offset integer [, default anyelement ]]) | same type as value | returns <value> evaluated at the row that is <offset> rows before the current row within the partition; if there is no such row, instead return <default> (which must be of the same type as <value> ). Both <offset> and <default> are evaluated with respect to the current row. If omitted, <offset> defaults to 1 and <default> to null |
lead(value anyelement [, offset integer [, default anyelement ]]) | same type as value | returns <value> evaluated at the row that is <offset> rows after the current row within the partition; if there is no such row, instead return <default> (which must be of the same type as <value> ). Both <offset> and <default> are evaluated with respect to the current row. If omitted, <offset> defaults to 1 and <default> to null |
first_value(value any) | same type as value | returns <value> evaluated at the row that is the first row of the window frame |
last_value(value any) | same type as value | returns <value> evaluated at the row that is the last row of the window frame |
nth_value(value any, nth integer) | same type as value | returns <value> evaluated at the row that is the <nth> row of the window frame (counting from either the first or the last row in the frame, depending on the FROM option); null if no such row |
You can also use most aggregate functions in a window function. For a list of built-in aggregate functions, see Aggregate Functions. Aggregate functions act as window functions if an OVER
clause follows the function call.
All of the functions depend on the sort ordering specified by the ORDER BY
clause of the window definition. Rows that aren't unique in the ORDER BY
columns are peers. The four ranking functions (including cume_dist
) are defined to give the same answer for all peer rows.
The modified_rank
function differs from rank
in that it assigns the lowest rank of all entries in case of a tie. For example:
The first_value
, last_value
, and nth_value
consider only the rows within the window frame. By default, this contains the rows from the start of the partition through the last peer of the current row. You can redefine the frame by adding a suitable frame specification (RANGE
, ROWS
) to the OVER
clause. For information about frame specifications, see Window Function Call Syntax.
If you use an aggregate function as a window function, it aggregates over the rows within the current row's window frame. An aggregate used with ORDER BY
and the default window frame definition produces a running sum type of behavior. To obtain aggregation over the whole partition, exclude ORDER BY
or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
.
cume_dist
computes the fraction of partition rows that are less than or equal to the current row and its peers. percent_rank
computes the fraction of partition rows that are less than the current row, assuming the current row doesn't exist in the partition.
Possible window function call syntax:
where <window_definition>
has the syntax:
The optional <frame_clause>
can be one of:
where <frame_start>
and <frame_end>
can be one of:
and <frame_exclusion>
can be one of:
In the previous definitions, <expression>
represents any value expression that doesn't contain window function calls.
<window_name>
is a reference to a named window specification defined in the query's WINDOW
clause. Or you can give a full <window_definition>
in parentheses. See SELECT. OVER wname
isn't equivalent to OVER (wname ...)
. The latter copies and modifies the window definition, and is rejected if the referenced window specification includes a frame clause.
The PARTITION BY
clause groups the rows of the query into partitions, which are processed separately by the window function. Expressions in the PARTITION BY
or ORDER BY
clauses can't be output-column names or numbers. Without PARTITION BY
, all rows produced by the query are treated as a single partition. The ORDER BY
clause determines the order in which the rows of a partition are processed by the window function. Without ORDER BY
, rows are processed in an unspecified order.
The <frame_clause>
specifies the set of rows in the window frame, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition. The set of rows in the frame can vary depending on which row is the current row. The frame can be in RANGE
, ROWS
or GROUPS
mode. The frame runs from the <frame_start>
to the <frame_end>
. If <frame_end>
is omitted, the end defaults to CURRENT ROW
.
A <frame_start>
of UNBOUNDED PRECEDING
means that the frame starts with the first row of the partition, and a <frame_end>
of UNBOUNDED FOLLOWING
means that the frame ends with the last row of the partition.
In RANGE
or GROUPS
mode, a <frame_start>
of CURRENT ROW
means the frame starts with the current row's first peer row (a row that the window's ORDER BY
clause sorts as equivalent to the current row), while a <frame_end>
of CURRENT ROW
means the frame ends with the current row's last peer row. In ROWS
mode, CURRENT ROW
is the current row.
In the <offset> PRECEDING
and <offset> FOLLOWING
frame options, the <offset>
is an expression without variables, aggregate functions, or window functions. The meaning of the <offset>
depends on the frame mode:
-
In
ROWS
mode, the<offset>
returns a non-null, non-negative integer, and the option means that the frame starts or ends the specified number of rows before or after the current row. -
In
RANGE
mode, theORDER BY
clause specifies exactly one column. The<offset>
specifies the maximum difference between the value of that column in the current row and its value in preceding or following rows of the frame. The data type of the<offset>
expression varies depending on the data type of the ordering column. For numeric ordering columns, it's the same type as the ordering column. For datetime ordering columns it is aninterval
. For example, if the ordering column is of typedate
ortimestamp
, writeRANGE BETWEEN '1 day' PRECEDING AND '10 days' FOLLOWING
. The<offset>
is still required to be non-null and non-negative, but "non-negative" depends on its data type.
In any case, the distance to the end of the frame is limited by the distance to the end of the partition. Rows near the end of the partition can contain fewer rows.
Notice that in ROWS
mode, 0 PRECEDING
and 0 FOLLOWING
is equivalent to CURRENT ROW
. This normally holds in RANGE
mode as well, for an appropriate data-type-specific meaning of zero.
The <frame_exclusion>
option allows rows around the current row to be excluded from the frame. EXCLUDE CURRENT ROW
excludes the current row from the frame. EXCLUDE GROUP
excludes the current row and its ordering peers from the frame. EXCLUDE TIES
excludes any peers of the current row from the frame, but not the current row itself. EXCLUDE NO OTHERS
defines the default behavior of not excluding the current row or its peers.
The default framing option is RANGE UNBOUNDED PRECEDING
, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. With ORDER BY
, this sets the frame to be all rows from the partition start through the current row's last ORDER BY
peer. Without ORDER BY
, this means all rows of the partition are included in the window frame.
<frame_start>
can’t be UNBOUNDED FOLLOWING
, <frame_end>
can’t be UNBOUNDED PRECEDING
, and the <frame_end>
can’t be earlier in the list of <frame_start>
and <frame_end>
options than the <frame_start>
is. For example RANGE BETWEEN CURRENT ROW AND offset PRECEDING
isn’t allowed. But, ROWS BETWEEN 7 PRECEDING AND 8 PRECEDING
is allowed, even though it never selects any rows.
The FROM FIRST
or FROM LAST
options are only valid for the nth_value
function. They specify whether the n-th value is counted from the first row or the last row in the frame. The default is FROM FIRST
.
For some window functions, NULL
values within a window frame can cause undesired results. A common example is a table imported from a spreadsheet where values aren't repeated for rows of the same group. For example:
In this example, the country
value only occurs in the first value of each group, but a query with a filter by region
can return all rows with NULL
in the country
column. You can fix this with the IGNORE NULLS
clause and the last_value
function:
This query produces the result:
IGNORE NULLS
is only supported for the last_value
function. The clause RESPECT NULLS
doesn't ignore NULL
values and is the default behavior.
In Data Cloud SQL the IGNORE NULLS
option is only supported for the last_value
function.
To caing parameter-less aggregate functions as window functions, use syntaxes with *.
For example count(*) OVER (PARTITION BY x ORDER BY y)
.Window-specific functions don’t support DISTINCT
or ORDER BY
within the function argument list.
Window function calls can only be in the SELECT
list and the ORDER BY
clause of the query.