Aggregate Functions
Aggregate functions allow you to perform calculations on a set of records and return summarized information.
Function | Return Type | Input Arguments | Description |
---|---|---|---|
approx_distinct({x1}, {x2}, ...) | Bigint |
| Returns the approximate number of distinct values of input x. If all the input values are null, it returns zero. This function performs better on large volumes of data and is appropriate when exact results aren’t required. |
approx_most_frequent({buckets}, {value}, {capacity}) | Map |
| Returns a map containing the top elements with corresponding estimated frequency. Use this function to find the frequent values up to buckets of elements. |
approx_percentile({x}, {percentage}) | Same as the input value x |
| Returns the approximate percentile for all input values of input x at the given percentage. |
approx_set({x1}, {x2}, ...) | HyperLogLog |
| Returns the HyperLogLog sketch of the input data set of x. |
arbitrary({x1}, {x2}, ...) | Same as the input |
| Returns an arbitrary non-null value of the input value x if it exists. |
array_agg({x1}, {x2}, ...) | Array (same type as the input) |
| Returns an array created from the input elements. |
avg({x1}, {x2}, ...) | Double |
| Returns the arithmetic mean of all the input values. |
bitwise_and_agg({x1}, {x2}, ...) | Bigint |
| Returns the bitwise AND of all the input values in twos complement representation. |
bitwise_or_agg({x1}, {x2}, ...) | Bigint |
| Returns the bitwise OR of all the input values in twos complement representation. |
bool_and({x1}, {x2}, ...) | Boolean |
| If every input value is TRUE, it returns TRUE, otherwise FALSE. |
bool_or({x1}, {x2}, ...) | Boolean |
| If any input value is TRUE, it returns TRUE, otherwise FALSE. |
checksum({x1}, {x2}, ...) | Varbinary |
| Returns the checksum of the given input values. |
corr({x1}, {x2}, ...) | Double |
| Returns the coefficient of correlation of the set of input values. |
count(*) | Bigint |
| Returns the number of input rows. |
count(x) | Bigint |
| Returns the number of non-null input values. |
count_if({x1}, {x2}, ...) | Bigint |
| Returns the count of TRUE values in the input values. If there are zero input or if x evaluates to FALSE or Null for all rows, it returns Zero. |
covar_pop({x1}, {x2}, ...) | Double |
| Returns the population covariance of the input values. |
covar_samp({x1}, {x2}, ...) | Double |
| Returns the sample covariance of the input values. |
every({x1}, {x2}, ...) | Boolean |
| If every input value is TRUE, it returns TRUE, otherwise FALSE. |
geometric_mean({x1}, {x2}, ...) | Double |
| Returns the geometric mean of all the input values. |
histogram({x1}, {x2}, ...) | Map |
| Returns a map containing the count of the number of times each input value occurs. |
kurtosis({x1}, {x2}, ...) | Double |
| Returns the excess kurtosis of all the input values. |
map_agg(({key1}, {value1}), ({key2}, {value2})...) | Map |
| Returns a map created by aggregating the input (key, value) pairs. |
map_union({map1}, {map2}, ...) | Map |
| Returns the union of all the input maps. |
max({x1}, {x2}, ...) | Same as the input |
| Returns the maximum value of all the input values. |
max_by({x1}, {y1, ...}) | Same as x |
| Returns the value of x associated with the maximum value of y over all input values. |
merge({hll1}, {hll2}, ...) | HyperLogLog |
| Returns the HyperLogLog of the aggregate union of the individual hll HyperLogLog structures. |
min({x1}, {x2}, ...) | Same as the input |
| Returns the minimum value of all the input values. |
min_by({x1}, {y1, ...}) | Same as x |
| Returns the value of x associated with the minimum value of y over all input values. |
multimap_agg(({key1}, {value1}), ...) | Map |
| Returns a multimap created from the input key/value pairs. Each key can be associated with multiple values. |
numeric_histogram({buckets}, {values}) | Map |
| Returns an approximate histogram with up to buckets number of all input values. |
qdigest_agg({x1}, {x2}, ...) | Same as x |
| Returns the qdigest , which is composed of all the input values. |
regr_intercept({x1}, {y1, ...}) | Double |
| Returns the linear regression intercept of the input values. x is the dependent value. y is the independent value. |
regr_slope({x1}, {y1, ...}) | Double |
| Returns the linear regression slope of the input values. x is the dependent value. y is the independent value. |
skewness({x1}, {x2}, ...) | Double |
| Returns the skewness of all the input values. |
sum({x1}, {x2}, ...) | Same as x |
| Returns the sum of all the input values. |
stddev({x1}, {x2}, ...) | Double |
| Returns the sample standard deviation of the input values. |
stddev_pop({x1}, {x2}, ...) | Double |
| Returns the population standard deviation of the input values. |
stddev_samp({x1}, {x2}, ...) | Double |
| Returns the sample standard deviation of the input values. |
tdigest_agg({x1}, {x2}, ...) | Tdigest |
| Returns the tdigest of all the input values. |
variance({x1}, {x2}, ...) | Double |
| Returns the sample variance of the input values. |
var_pop({x1}, {x2}, ...) | Double |
| Returns the population variance of the input values. |
var_samp({x1}, {x2}, ...) | Double |
| Returns the sample variance of the input values. |