Newer Version Available

This content describes an older version of this product. View Latest

group-by

Groups the data in a data stream by one or more fields.

Syntax

The statements cogroup and group-by are interchangeable. For clarity, we use group-by for statements that involve one data stream and cogroup for statements that involve two or more data streams.

The cogroup statement does not support rollup.

Note

group-by takes the following syntax.
1result = group data_stream by rollup(field1, [field2]);
  • rollup—Optional. A subclause of group-by that calculates totals of grouped data. For each set of grouped results, the query returns an additional row with a null value for the dimension and the total results for the measure.

    rollup must include all fields in the group statement. For example, this query rolls up the Type and Lead_Source fields.

    1q = group q by rollup('Type', 'Lead_Source');
    This query throws an error because it rolls up Type but not Lead_Source.
    1q = group q by rollup('Type'), 'Lead_Source';
    The rollup modifier supports the following aggregate functions.
    • Count
    • Sum
    • Average
    • Min
    • Max
    • Unique

    Using rollup with other aggregate or windowing functions returns an error.

  • data_stream—The data input to be grouped.
  • field—Field by which data is grouped.

Example

Group by One Field

This example groups by the account owner field to display the number of opportunities that each account owner has.

1q = load "DTCOpps";
2q = group q by 'Account_Owner';
3q = foreach q generate 'Account_Owner' as 'Account_Owner', count() as 'count';
4q = order q by 'count' asc;

Diagram showing a chart grouped by one field.

Example

Group By Rollup

This example groups by the Stage Name field. The query first shows the total sum for each stage name. It then rolls up the total sum for all opportunities.

1q = load "opportunityData";
2q = group q by rollup('StageName');
3q = foreach q generate
4    'StageName' as 'Stage Name',
5    sum('Amount') as 'sum_Amount'
6q = order q by ('Stage Name');

Grouped values with a total calculated by rollup

The value of the Stage Name column for the total Sum of Amount value is null. The inclusion of the null value can sometimes lead to confusion when performing other calculations on your data. Let’s add a label called All Stages to clarify that 212,310,000 represents the sum of all of stages.

1q = load "opportunityData";
2q = group q by rollup('StageName');
3q = order q by ('Stage Name');
4q = foreach q generate
5    (case
6        when grouping('StageName') == 1 then "All Stages"
7        else 'StageName'
8    end) as 'Stage Name',
9    sum('Amount') as 'sum_Amount';

To add the extra label, we use a case statement with the grouping() function. Here, we add a case statement that uses grouping() to check whether the Stage Name column has a null value returned from the rollup. If so, it labels the column "All Stages". Otherwise, it returns the stage name.

Grouped values with a total calculated by rollup

If you include an order-by statement after a group-by statement with rollup, the ordering columns must be the same columns as in the group-by statement. List them in the same order. Otherwise, the query throws an error.

Note