Newer Version Available

This content describes an older version of this product. View Latest

Limit the use of Unique()

Sometimes you need to use unique() in a query, but be aware that it can affect performance if there is a large number of unique values.
For example, suppose you want to count the number of different industries that you have opportunities with.
1q = load "DTC_Opportunity_SAMPLE";
2q = group q by all;
3q = foreach q generate unique('Industry') as 'unique_Industry';

If your data contains a few thousand industries, this query will not negatively affect performance.

However, suppose you want to count the number of unique customers (accounts):
1q = load "AcquiredAccount";
2q = group q by all;
3q = foreach q generate unique('Account_Id') as 'unique_Account_Id';
If your company has millions of customers, be aware that this query will have some affect on performance.

While counting the number of unique values might impact performance, counting the total number of rows in a table has almost no impact.

Note