Reference: Aggregation Functions
Aggregation functions allow computing a single scalar value from a set of many values. Aggregates are often used as a statistic summarizing the contents of a data set.
Usage example:
-- Determine the price range over all sales
select min(price), max(price), avg(price)
from sales;
-- Or for each customer
select customer_id, min(price), max(price), avg(price)
from sales
group by customer_id;null values and return null when they aggregated zero values.General-purpose functions
any_value(x)- Extract an arbitrary non-null value from the input.
approx_count_distinct(x)- Approximate a count of the distinct
xvalues (i.e.,count(distinct x)). While a precisecount(distinct x)needs to store all elements, this approximate version only uses a fixed state for a HyperLogLog sketch independent of the data size. This function can be very useful to compute distinct counts over huge data sets efficiently. arg_max(x, max)- Returns the
xvalue for the maximummaxvalue. When there are multiple equalmaxvalues, thexvalue is chosen arbitrarily among them. However, multiplearg_maxaggregates over the samemaxwill come from the same tuple. arg_min(x, min)- Returns the
xvalue for the minimumminvalue. The same caveats as forarg_maxapply. array_agg(x)- Collect all incoming values (including
null) in an array. The order in the array is arbitrary, except when explicitly specified asarray_agg(x order by x). avg(x)- Calculate the arithmetic mean over
x. bit_and(x)- Compute a bitwise AND over all
xvalues. bit_or(x)- Compute a bitwise OR over all
xvalues. bit_xor(x)- Compute a bitwise XOR over all
xvalues. bool_and(x)- Compute a boolean AND over all
xvalues. bool_or(x)- Compute a boolean OR over all
xvalues. count(*)- Count the number of input rows.
count(x)- Count the number of non-null
xvalues. every(x)- Alias for
bool_and. max(x)- Determine the maximum value. For equivalent maximum values (e.g., with case-insensitive collate) returns an arbitrary one.
min(x)- Determine the minimum value. For equivalent minimum values returns an arbitrary one.
string_agg(x, delimiter)- Concatenate all non-null input values with the specified
delimiterbetweenxvalues. The order is arbitrary, except when explicitly specified asstring_agg(x, ',' order by x). sum(x)- Calculate the sum of all
xvalues. Uses checked arithmetic to guarantee correct results without overflow. Fordoublevalues, CedarDB uses Kahan summation for extended precision.
Modifiers
You can specify the distinct or order by within aggregation functions to modify the input to that function.
For example, a count(distinct userid) calculates how many unique users are in a set.
This also works on other duplicate-sensitive aggregates by deduplicating the input x values before processing them.
For order-sensitive aggregates like string_agg, you can also specify an order by clause to first sort the aggregated
values by an arbitrary expression.
Statistical Aggregates
The following functions calculate more detailed statistical attributes over the input data.
covar_pop(y, x)- Compute the population covariance
covar_samp(y, x)- Compute the sample covariance
regr_avgx(y, x)- Compute the average of the regression input (independent)
xvariable, i.e.,sum(x)/count(*) regr_avgy(y, x)- Compute the average of the regression output (dependent)
yvariable, i.e.,sum(y)/count(*) regr_count(y, x)- Count the number of rows where both
yandxare not null. regr_intercept(y, x)- Determine the y-intercept of a linear least-squares fit equation of the (x, y) points.
regr_r2(y, x)- Compute the square of the correlation coefficient
r². regr_slope(y, x)- Determine the slope of a linear least-squares fit equation of the (x, y) points.
regr_sxx- Compute the sum of squares of the
xvariable , i.e.,sum(x^2) - sum(x)^2/count(*). regr_sxy- Compute the sum of products of
xtimesy, i.e.,sum(x*y) - sum(x) * sum(y)/count(*). stddev(x)- Compute the standard deviation, alias for
stddev_samp. stddev_pop(x)- Compute the population standard deviation of
x. stddev_samp(x)- Compute the sample standard deviation of
x. variance(x)- Compute the variance, alias for
var_samp. var_pop(x)- Compute the population variance of
x. var_samp(x)- Compute the sample variance of
x.
Ordered-set aggregate functions
The following functions are ordered-set aggregates, with an order specified as within group (order by exp).
mode() within group (...)- Compute the most frequent value
percentile_cont(fraction) within group (...)- Compute the continuous
nth percentile with the givennas a fraction between 0 and 1. The continuous percentile interpolates between values if necessary. percentile_disc(fraction) within group (...)- Compute the discrete
nth percentile with the givennas a fraction between 0 and 1. The discrete percentile returns the first value exceeding the percentile.