Groups matched records. The group and cogroup statements are interchangeable. However, cogroup is typically used to operate on more than 1 input
stream.
Simple Grouping
Adds one or more columns to a group. If data is
grouped by a value that’s null in a row, that whole row
is removed from the
result.
Syntax:
or
The
order of the fields matters for limit queries, but not for top queries.
Group by 1
dimension:
a = group a by year;
Group by multiple
dimensions:
Inner Cogrouping
Cogrouping means that two input streams, called left and right are
grouped independently and arranged side by side. Only data that exists in both groups
appears in the results.
Syntax:
This example is a simple
cogroup operation on 2
datasets:
You can cogroup more than 2
datasets:
This example performs a
cogroup
operation:
You can’t have the same stream on both sides of a
cogroup operation. To perform a
cogroup
operation on 1 dataset, load the dataset twice so you have 2 streams.
You
can also load 1 dataset and filter it into 2 different streams:
This code throws an error because it performs a
cogroup operation on a single stream,
a:
To use aggregate functions when cogrouping, specify which input side to use in the
aggregate function. For example, if you have an
a side
and a
b side, and each contains a particular measure,
use one of these
syntaxes:
This query is valid because it uses the third syntax form to specify that
miles comes from the
a
side.
This query isn’t valid because
miles doesn't specify
which side it is coming
from:
If a lens or dashboard has a
cogroup query, specify
the input stream for projections and for
count()
aggregations on
cogroup queries, as in this
example:
Outer Cogrouping
Outer cogrouping combines groups as an outer join. For
the half-matches, null rows are added. The grouping keys are taken from the input that
provides the
value.
Syntax:
Specify
left, right, or
full to indicate whether to perform a left outer
join, a right outer join, or a full join.
Example: z =
cogroup x by (day,origin) left, y by (day,airport);
You can apply an outer
cogrouping across more than 2 sets of data. This example does a left outer join from a to b,
with a right join to c:
result = cogroup a by keya left, b by keyb right, c by
keyc;
Outer joins return null when there is no match, instead of defaulting to
zero.