group-by

Organizes the rows returned from a query into groups. Within each group, you can apply an aggregate function, such as count() or sum() to get the number of items or sum, respectively.

Syntax

group-by takes this syntax.
group data_stream by fields;
data_stream
Data input to group.
fields
Fields by which data is grouped.

Group-by One Field

In this example, the query counts the number of rows for each Category field and groups the counts by category.
q = load "Superstore";
q = group q by 'Category';
q = foreach q generate 'Category' as 'Category', count() as 'count';
q = limit q 2000;
Category Count of Rows
Furniture 2,121
Office Supplies 6,026
Technology 1,847

cogroup and group-by are interchangeable. For clarity, we use group-by for statements that involve one data stream and cogroup for statements that involve two or more data streams.

Note

Group-by with Null Values

To return grouped null values in your queries, you must select the preference to include null values in Setup. Otherwise, queries ignore null values.

  1. In Setup, enter Analytics in the Quick Find box.
  2. Select Settings from the list of Analytics options.
  3. In Settings, click the checkbox for Include null values in Analytics queries.
Null handling preference in Setup

Here’s an example of a query that returns null values. It orders the results by the Sub_Category field and specifies that the results display in ascending order, with nulls first.

q = load "Superstore";
q = group q by 'Sub_Category';
q = foreach q generate 'Sub_Category' as 'Sub_Category', count() as 'count';
q = order q by 'Sub_Category' asc nulls first;
q = limit q 2000;
Sub-Category Count of Rows
- 4
Accessories 775
Appliances 466
Art 796
Binders 1,523
Bookcases 228
Chairs 617
Copiers 68
Envelopes 254
Fasteners 217
Furnishings 957
Labels 364
Machines 115
Paper 1,370
Phones 889
Storage 846
Supplies 190
Tables 319

Group-by all

In this example, the query counts all of the rows and returns the number of different industries that you have opportunities with.

q = load "DTC_Opportunity_SAMPLE";
q = group q by all;
q = foreach q generate unique('Industry') as 'unique_Industry';
# Unique of Industry
1 20