Reference: Aggregation Functions
Aggregation functions allow computing a single scalar value from a set of many values. Aggregates are often used as a statistic summarizing the contents of a data set.
Usage example:
-- Determine the price range over all sales
select min(price), max(price), avg(price)
from sales;
-- Or for each customer
select customer_id, min(price), max(price), avg(price)
from sales
group by customer_id;
null
values and return null
when they aggregated zero values.General-purpose functions
any_value(x)
- Extract an arbitrary non-null value from the input.
approx_count_distinct(x)
- Approximate a count of the distinct
x
values (i.e.,count(distinct x)
). While a precisecount(distinct x)
needs to store all elements, this approximate version only uses a fixed state for a HyperLogLog sketch independent of the data size. This function can be very useful to compute distinct counts over huge data sets efficiently. arg_max(x, max)
- Returns the
x
value for the maximummax
value. When there are multiple equalmax
values, thex
value is chosen arbitrarily among them. However, multiplearg_max
aggregates over the samemax
will come from the same tuple. arg_min(x, min)
- Returns the
x
value for the minimummin
value. The same caveats as forarg_max
apply. array_agg(x)
- Collect all incoming values (including
null
) in an array. The order in the array is arbitrary, except when explicitly specified asarray_agg(x order by x)
. avg(x)
- Calculate the arithmetic mean over
x
. bit_and(x)
- Compute a bitwise AND over all
x
values. bit_or(x)
- Compute a bitwise OR over all
x
values. bit_xor(x)
- Compute a bitwise XOR over all
x
values. bool_and(x)
- Compute a boolean AND over all
x
values. bool_or(x)
- Compute a boolean OR over all
x
values. count(*)
- Count the number of input rows.
count(x)
- Count the number of non-null
x
values. every(x)
- Alias for
bool_and
. max(x)
- Determine the maximum value. For equivalent maximum values (e.g., with case-insensitive collate) returns an arbitrary one.
min(x)
- Determine the minimum value. For equivalent minimum values returns an arbitrary one.
string_agg(x, delimiter)
- Concatenate all non-null input values with the specified
delimiter
betweenx
values. The order is arbitrary, except when explicitly specified asstring_agg(x, ',' order by x)
. sum(x)
- Calculate the sum of all
x
values. Uses checked arithmetic to guarantee correct results without overflow. Fordouble
values, CedarDB uses Kahan summation for extended precision.
Modifiers
You can specify the distinct
or order by
within aggregation functions to modify the input to that function.
For example, a count(distinct userid)
calculates how many unique users are in a set.
This also works on other duplicate-sensitive aggregates by deduplicating the input x
values before processing them.
For order-sensitive aggregates like string_agg
, you can also specify an order by
clause to first sort the aggregated
values by an arbitrary expression.
Statistical Aggregates
The following functions calculate more detailed statistical attributes over the input data.
covar_pop(y, x)
- Compute the population covariance
covar_samp(y, x)
- Compute the sample covariance
regr_avgx(y, x)
- Compute the average of the regression input (independent)
x
variable, i.e.,sum(x)/count(*)
regr_avgy(y, x)
- Compute the average of the regression output (dependent)
y
variable, i.e.,sum(y)/count(*)
regr_count(y, x)
- Count the number of rows where both
y
andx
are not null. regr_intercept(y, x)
- Determine the y-intercept of a linear least-squares fit equation of the (x, y) points.
regr_r2(y, x)
- Compute the square of the correlation coefficient
r²
. regr_slope(y, x)
- Determine the slope of a linear least-squares fit equation of the (x, y) points.
regr_sxx
- Compute the sum of squares of the
x
variable , i.e.,sum(x^2) - sum(x)^2/count(*).
regr_svxy
- Compute the sum of products of
x
timesy
, i.e.,sum(x*y) - sum(x) * sum(y)/count(*)
. stddev(x)
- Compute the standard deviation, alias for
stddev_samp
. stddev_pop(x)
- Compute the population standard deviation of
x
. stddev_samp(x)
- Compute the sample standard deviation of
x
. variance(x)
- Compute the variance, alias for
var_samp
. var_pop(x)
- Compute the population variance of
x
. var_samp(x)
- Compute the sample variance of
x
.
Ordered-set aggregate functions
The following functions are ordered-set aggregates, with an order specified as within group (order by exp)
.
mode() within group (...)
- Compute the most frequent value
percentile_cont(fraction) within group (...)
- Compute the continuous
n
th percentile with the givenn
as a fraction between 0 and 1. The continuous percentile interpolates between values if necessary. percentile_disc(fraction) within group (...)
- Compute the discrete
n
th percentile with the givenn
as a fraction between 0 and 1. The discrete percentile returns the first value exceeding the percentile.