Aggregate Functions

Use aggregate functions to perform computations on values.

Using an aggregate function on an empty set returns null. For example, if you use an aggregate function with a nonmatching column of an outer cogrouping, you might have an empty set.

Aggregation functions treat each line as its own group if not preceded by group by.

This table lists the aggregate functions that are supported:

Aggregate Function	Description
`avg()` or `average()`	Returns the average value of a numeric field. For example, to calculate the average number of miles:
`count()`	Returns the number of rows that match the query criteria. For example, to calculate the number of carriers: The `count()` function operates on streams that were inputs to the `group` or `cogroup` statements. It doesn’t operate on the newly grouped stream or on an ungrouped stream. You can’t pass `a1` to the `count()` function because it’s a newly grouped stream.
`first()`	Returns the value for the first tuple. To work as expected, you must be aware of the sort order or know that the values of that measure are the same for all tuples in the set. For example, you can use these statements to compute the distance between each combination of origin and destination:
`last()`	Returns the value for the last tuple. For example, to compute the distance between each combination of origin and destination:
`max()`	Returns the maximum value of a field. This function takes only a measure as an argument. It can’t take a dimension.
`median()`	Accepts a grouped expression of numeric type and returns the middle number (by sorted order, ignoring null values). If there is no one middle number (in other words, the count of non-null values is even), then median returns the average of the two numbers closest to the middle. The expression can be any identifier, such as ‘xInt’ or ‘price’, but cannot be a complex expression, such as price/100 or ceil(distance), or a literal, such as 2.5. If median is not preceded by a group by clause, it treats each individual row as its own group:
`min()`	Returns the minimum value of a field. This function takes only a measure as an argument. It can’t take a dimension.
`sum()`	Returns the sum of a numeric field.
`unique()`	Returns the count of unique values. For example, to find how many origins and destinations a carrier flies from: The best way to add summaries (for example, a summary row on a compare table) using `unique()` is to use it as a windowing function. Note
`stddev()`	Returns the sample standard deviation computed on the group. Accepts a grouped expression of numeric type. If the number of non-null values in the group is equal to 1, stddev return null. Otherwise, stddev returns the sample standard deviation computed on the group, ignoring null values. The expression can be any numeric identifier, such as 'xInt' or 'price', but cannot be a complex expression, such as price/100 or ceil(price), or a literal, such as 2.5.
`stddevp()`	Returns the population standard deviation computed on the group. Accepts a grouped expression of numeric type and returns the population standard deviation computed on the group, ignoring null values. The expression can be any numeric identifier, such as 'xInt' or 'price', but cannot be a complex expression, such as price/100 or ceil(price), or a literal, such as 2.5.
`var()`	Returns the sample variance (also called the unbiased variance) computed on the group. Accepts a grouped expression of numeric type. If the number of non-null values in the group is equal to 1, var return null. Otherwise, var returns the sample variance computed on the group, ignoring null values. The expression can be any numeric identifier, such as 'xInt' or 'price', but cannot be a complex expression, such as price/100 or ceil(price), or a literal, such as 2.5.
`varp()`	Returns the population variance (also called the biased variance) computed on the group. Accepts a grouped expression of numeric type and returns the population variance computed on the group, ignoring null values. The expression can be any numeric identifier, such as 'xInt' or 'price', but cannot be a complex expression, such as price/100 or ceil(price), or a literal, such as 2.5.
`percentile_disc()`	Computes a specific percentile for sorted values in an entire rowset or within distinct partitions of a rowset. The full syntax is: percentile_disc(p as *numeric) within group* (order by expr [asc \| desc]) The percentile_disc function accepts a grouped expression expr of numeric type and sorts it in the specified order (asc or desc). If order is not specified, the default order is asc. It returns the value behind which (100p)% of values in the group would fall in the sorted order, ignoring null values. p can be any real numeric value between 0 and 1, and is accurate to 8 decimal places of precision. expr can be any identifier, such as 'xInt' or 'price', but cannot be a complex expression, such as price/100 or ceil(distance), or a literal, such as 2.5. If expr contains no value that falls exactly at the 100p-th percentile mark, percentile_disc will return the next value from expr in the sort order. For example, if Mea1 contains the values [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] then: Example query:
`percentile_cont()`	Calculates a percentile based on a continuous distribution of the column value. The full syntax is: percentile_cont(p as *numeric) within group* (order by expr [asc \| desc]) The percentile_cont function accepts a grouped expression expr of numeric type and sorts it in the specified order (asc or desc). If the order is not specified, the default order is asc. It returns the value behind which (100p)% of values in the group would fall in the sorted order, ignoring null values. p can be any real numeric value between 0 and 1. expr can be any identifier, such as 'xInt' or 'price', but cannot be a complex expression, such as price/100 or ceil(distance), or a literal, such as 2.5. If expr contains no value that falls exactly at the 100p-th percentile mark, percentile_cont returns a value linear interpolated from the two closest values in expr. For example, if Mea1 contains the values [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] then: Example query:
`regr_slope(y, x)`	The simple linear regression function `regr_slope` accepts a grouped dependent numeric expression `y` and a grouped independent numeric expression `x`, and returns the slope of the regression line. The `regr_slope` function only considers pairs of (`x, y`) values where both values are not `null`, and returns `null` if there exists fewer than 2 such pairs in the given group. Simple linear regression functions work like aggregation functions on simple grouped values, but do not work with cogroups. Example query:
`regr_intercept(y, x)`	The simple linear regression function `regr_intercept` accepts a grouped dependent numeric expression `y` and a grouped independent numeric expression `x`, and returns the y-intercept for the regression line. The `regr_intercept` function only considers pairs of (`x, y`) values where both values are not `null`, and returns `null` if there exists fewer than 2 such pairs in the given group. Simple linear regression functions work like aggregation functions on simple grouped values, but do not work with cogroups. Example query:
`regr_r2(y, x)`	The simple linear regression function `regr_r2` accepts a grouped dependent numeric expression `y` and a grouped independent numeric expression `x`, and returns the coefficient of determination (also called R-squared or goodness of fit) for the regression. The `regr_r2` function only considers pairs of (`x, y`) values where both values are not `null`, and returns `null` if there exists fewer than 2 such pairs in the given group. Simple linear regression functions work like aggregation functions on simple grouped values, but do not work with cogroups. Example query:

Analytics SAQL Reference

Aggregate Functions