Newer Version Available
group and cogroup
Syntax
1result = group rows by field;
2result = group rows by (field1, field2, ...);
3result = group rows by expression[, rows by expression ...];
4result = group rows by expression [left | right | full], rows by expression;Simple Grouping
Adds one or more columns to a group. If data is grouped by a value that’s null in a row, that whole row is removed from the result.
Syntax:
1result = group rows by field;or
1result = group rows by (field1, field2, ...);Group by 1 dimension:
a = group a by year;1a = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
2a = group a by (year, month);
3a = foreach a generate year as year, month as month;Inner Cogrouping
Cogrouping means that two input streams, called left and right are grouped independently and arranged side by side. Only data that exists in both groups appears in the results.
Syntax:
1result = cogroup rows by expression[, rows by expression ...];1a = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
2b = load "0Fbyy000000002qCAA/0Fcyy000000002WCAQ";
3a = cogroup a by carrier, b by carrier;1result = cogroup a by keya, b by keyb, c by keyc;1z = cogroup x by (day,origin), y by (day,airport);1a = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
2b = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
3b = cogroup a by ClosedDate, b by CreatedDate;
4c = foreach b generate sum(a.Amount) as Amount;1a = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
2a = filter a by "region" in ["West"];
3a = filter a by "status" in ["closed"];
4b = filter a by "year" in [2014];
5c = filter a by "year" in [2015];
6d = cogroup b by ("state"), c by ("state");
7d = foreach d generate "state" as "state", sum(b.Amount) as "Amount_2014", sum(c.Amount) as "Amount_2015";1a = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
2b = cogroup a by ClosedDate, a by CreatedDate;
3c = foreach b generate sum(a.Amount) as Amount;1sum(inputSide['myMeasure'])
2sum(inputSide::myMeasure)
3sum(inputSide.myMeasure)1a = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
2b = load "0Fbyy000000002qCAA/0Fcyy000000002WCAQ";
3c = cogroup a by x, b by y;
4d = foreach c generate a.x as x, a.y as y, sum(a.miles) as miles;1a = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
2b = load "0Fbyy000000002qCAA/0Fcyy000000002WCAQ";
3c = cogroup a by x, b by y;
4d = foreach c generate a.x as x, a.y as y, sum(miles) as miles;1a = load "0Fbxx000000002qCAA/0Fcxx000000002WCAQ";
2b = load "0Fbyy000000002qCAA/0Fyy000000002WCAQ";
3c = cogroup a by 'OwnerName', b by 'OwnerName';
4c = foreach c generate a['OwnerName'] as 'OwnerName', sum(a['AmountConverted']) /
5 sum(b['Amount']) as 'sum_target_completed', count(a) as count;Outer Cogrouping
Syntax:
1result = cogroup rows by expression [left | right | full], rows by expression;Specify left, right, or full to indicate whether to perform a left outer join, a right outer join, or a full join.
Example: z = cogroup x by (day,origin) left, y by (day,airport);
You can apply an outer cogrouping across more than 2 sets of data. This example does a left outer join from a to b, with a right join to c:
result = cogroup a by keya left, b by keyb right, c by keyc;