Newer Version Available

This content describes an older version of this product. View Latest

foreach

Applies a set of expressions to every row in a dataset. This action is often referred to as projection.

Syntax

1q = foreach q generate expression as alias[, expression as alias ...];

The output column names are specified with the as keyword. The output data is ungrouped.

Using foreach with Ungrouped Data

When used with ungrouped data, the foreach statement maps the input rows to output rows. The number of rows remains the same.

Example

a2 = foreach a1 generate carrier as carrier, miles as miles;

Using foreach with Grouped Data

When used with grouped data, the foreach statement behaves differently than it does with ungrouped data.

Fields can be directly accessed only when the value is the same for all group members. For example, the fields that were used as the grouping keys have the same value for all group members. Otherwise, use aggregate functions to access the members of a group. The type of the column determines which aggregate functions you can use. For example, if the column type is numeric, you can use the sum() function.

Example

z = foreach y generate day as day, unique(origin) as uorg, count() as n;

Using foreach with a case Expression

To create logic in a foreach statement that chooses between conditional statements, use a case expression.

Example

This example query uses the simple case expression syntax:
1q = load "data";
2q = foreach q generate xInt, (case xInt % 3 
3      when 0 then "3n"     
4      when 1 then "3n+1"    
5      else "3n+2"
6end) as modThree;

Example

This example query uses the searched case expression syntax:
1q = load "data";
2q = foreach q generate price, (case     
3      when price < 1000 then "category1"     
4      when price >= 1000 and price < 2000 then "category2"    
5      else "category3"
6end) as priceLevel;

Projected Field Names

Each field name in a projection must be unique and not have the name 'none'. Invalid field names throw an error.

For example, the last line in this query is invalid because the same name is used for multiple projected fields:
1l = load "0Fabb000000002qCAA/0Fabb000000002WCAQ";
2r = load "0Fcyy000000002qCAA/0Fcyy000000002WCAQ";
3l = foreach l generate 'value'/'divisor' as 'value' , category as category;
4r = foreach r generate 'value'/'divisor' as 'value' , category as category;
5cg = cogroup l by category right, r by category;
6cg = foreach cg generate r.category as 'category', sum(r.value) as sumrval, sum(l.value) as sumrval;
The following query is also invalid because the projected field name can't be 'none'.
1q = load "Products";
2q = group q by all;
3q = foreach q generate count() as 'none';
4q = limit q 2000;