Newer Version Available
Limit the use of Unique()
Sometimes you need to use unique() in a query,
but be aware that it can affect performance if there is a large number of unique
values.
For example, suppose you want to count the number of different industries that you have
opportunities with.
1q = load "DTC_Opportunity_SAMPLE";
2q = group q by all;
3q = foreach q generate unique('Industry') as 'unique_Industry';If your data contains a few thousand industries, this query will not negatively affect performance.
However, suppose you want to count the number of unique customers
(accounts):
1q = load "AcquiredAccount";
2q = group q by all;
3q = foreach q generate unique('Account_Id') as 'unique_Account_Id';If your company has millions of customers, be aware that this query will have some affect
on performance.