Newer Version Available

This content describes an older version of this product. View Latest

Null Operators

Use null operators to select records that have (or do not have) fields with null values.

Null operators return true or false.

Operator Description
is null True when the value is null.
is not null True when the value is not null.

Use is null and is not null in projections and in post-projection filters. You can't use them in pre-projection filters.

For example, display all the accounts that your reps have met with at least once.

1q = load "Meetings";
2q = group q by 'Company';
3q = foreach q generate 'Company' as 'Company', sum('MeetingDuration') as 'TotalMeetings';
4
5--filter out fields with no meetings
6q = filter q by 'TotalMeetings' is not null;

Or, you can use case to replace null values with a value of your choice.

1q = load "dataset";
2q = foreach q generate (case when Name is null then "john doe" else Name end) as Name;

This example is not valid because you can't use is not null or is null before a projection:

1a = load "dataset";
2a = filter a by Year is not null;
3a = foreach a generate Name as Name, Year as Year;

Use is null with cogroup

A left outer cogroup combines the right data stream with the left data stream. If a record on the left side does not have a match on the right, the missing right value is null in the resulting data stream.

For example, suppose that you have a Meeting data set containing information about your rep's meetings with each account. You want to see all accounts that reps have not met with. Use a left outer cogroup between Ops and Meetings, then use is null to filter results.

1ops = load "Ops";
2meetings = load "Meetings";
3q = cogroup ops by 'Account' left, meetings  by 'Company' ;
4q = foreach q generate ops.'Account' as 'Account', sum(ops.'Amount') as 'sum_Amount', sum(meetings.'MeetingDuration') as 'TimeSpent';
5
6--use is null to get records with no time time spent 
7q = filter q by 'TimeSpent' is null;