Aggregate Functions

Aggregate functions allow you to perform calculations on a set of records and return summarized information.

FunctionReturn TypeInput ArgumentsDescription
approx_distinct({x1}, {x2}, ...)Bigint
  • x1, x2, ...: Any data type
Returns the approximate number of distinct values of input x. If all the input values are null, it returns zero. This function performs better on large volumes of data and is appropriate when exact results aren’t required.
approx_most_frequent({buckets}, {value}, {capacity})Map
  • buckets: Bigint
  • value: String or Integer
  • capacity: Bigint
Returns a map containing the top elements with corresponding estimated frequency. Use this function to find the frequent values up to buckets of elements.
approx_percentile({x}, {percentage})Same as the input value x
  • x: Any data type
  • percentage: A real number between 0 and 1
Returns the approximate percentile for all input values of input x at the given percentage.
approx_set({x1}, {x2}, ...)HyperLogLog
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the HyperLogLog sketch of the input data set of x.
arbitrary({x1}, {x2}, ...)Same as the input
  • x1, x2, ...: Any numeric data type
Returns an arbitrary non-null value of the input value x if it exists.
array_agg({x1}, {x2}, ...)Array (same type as the input)
  • x1, x2, ...: Any numeric data type
Returns an array created from the input elements.
avg({x1}, {x2}, ...)Double
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the arithmetic mean of all the input values.
bitwise_and_agg({x1}, {x2}, ...)Bigint
  • x1, x2, ...: Integers
Returns the bitwise AND of all the input values in twos complement representation.
bitwise_or_agg({x1}, {x2}, ...)Bigint
  • x1, x2, ...: Integers
Returns the bitwise OR of all the input values in twos complement representation.
bool_and({x1}, {x2}, ...)Boolean
  • x1, x2, ...: Booleans
If every input value is TRUE, it returns TRUE, otherwise FALSE.
bool_or({x1}, {x2}, ...)Boolean
  • x1, x2, ...: Booleans
If any input value is TRUE, it returns TRUE, otherwise FALSE.
checksum({x1}, {x2}, ...)Varbinary
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the checksum of the given input values.
corr({x1}, {x2}, ...)Double
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the coefficient of correlation of the set of input values.
count(*)Bigint
  • Asterisk (*)
Returns the number of input rows.
count(x)Bigint
  • x1, x2, ...: Any numeric data type
Returns the number of non-null input values.
count_if({x1}, {x2}, ...)Bigint
  • x1, x2, ...: Any numeric data type
Returns the count of TRUE values in the input values. If there are zero input or if x evaluates to FALSE or Null for all rows, it returns Zero.
covar_pop({x1}, {x2}, ...)Double
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the population covariance of the input values.
covar_samp({x1}, {x2}, ...)Double
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the sample covariance of the input values.
every({x1}, {x2}, ...)Boolean
  • x1, x2, ...: Booleans
If every input value is TRUE, it returns TRUE, otherwise FALSE.
geometric_mean({x1}, {x2}, ...)Double
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the geometric mean of all the input values.
histogram({x1}, {x2}, ...)Map
  • x1, x2, ...: Any numeric data type
Returns a map containing the count of the number of times each input value occurs.
kurtosis({x1}, {x2}, ...)Double
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the excess kurtosis of all the input values.
map_agg(({key1}, {value1}), ({key2}, {value2})...)Map
  • (key1, value1), ...: (key, value) pairs
Returns a map created by aggregating the input (key, value) pairs.
map_union({map1}, {map2}, ...)Map
  • map1, map2, ...: Maps
Returns the union of all the input maps.
max({x1}, {x2}, ...)Same as the input
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the maximum value of all the input values.
max_by({x1}, {y1, ...})Same as x
  • x1, y1, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the value of x associated with the maximum value of y over all input values.
merge({hll1}, {hll2}, ...)HyperLogLog
  • hll1, hll2, ...: HyperLogLog values
Returns the HyperLogLog of the aggregate union of the individual hll HyperLogLog structures.
min({x1}, {x2}, ...)Same as the input
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the minimum value of all the input values.
min_by({x1}, {y1, ...})Same as x
  • x1, y1, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the value of x associated with the minimum value of y over all input values.
multimap_agg(({key1}, {value1}), ...)Map
  • key1, value1, ...: (key, value) pairs
Returns a multimap created from the input key/value pairs. Each key can be associated with multiple values.
numeric_histogram({buckets}, {values})Map
  • buckets: Bigint
  • values: Integer
Returns an approximate histogram with up to buckets number of all input values.
qdigest_agg({x1}, {x2}, ...)Same as x
  • x1, x2, ...: Any numeric data type
Returns the qdigest, which is composed of all the input values.
regr_intercept({x1}, {y1, ...})Double
  • x1, y1, ...: Any numeric data type
Returns the linear regression intercept of the input values. x is the dependent value. y is the independent value.
regr_slope({x1}, {y1, ...})Double
  • x1, y1, ...: Any numeric data type
Returns the linear regression slope of the input values. x is the dependent value. y is the independent value.
skewness({x1}, {x2}, ...)Double
  • x1, x2, ...: Any numeric data type
Returns the skewness of all the input values.
sum({x1}, {x2}, ...)Same as x
  • x1, x2, ...: Any of these types: Integer, Real, Double, Fixed-precision decimal
Returns the sum of all the input values.
stddev({x1}, {x2}, ...)Double
  • x1, x2, ...: Any numeric data type
Returns the sample standard deviation of the input values.
stddev_pop({x1}, {x2}, ...)Double
  • x1, x2, ...: Any numeric data type
Returns the population standard deviation of the input values.
stddev_samp({x1}, {x2}, ...)Double
  • x1, x2, ...: Any numeric data type
Returns the sample standard deviation of the input values.
tdigest_agg({x1}, {x2}, ...)Tdigest
  • x1, x2, ...: Any numeric data type
Returns the tdigest of all the input values.
variance({x1}, {x2}, ...)Double
  • x1, x2, ...: Any numeric data type
Returns the sample variance of the input values.
var_pop({x1}, {x2}, ...)Double
  • x1, x2, ...: Any numeric data type
Returns the population variance of the input values.
var_samp({x1}, {x2}, ...)Double
  • x1, x2, ...: Any numeric data type
Returns the sample variance of the input values.