Use Group and Filter Pre-projection
Improve query performance by moving group and filter operations on simple fields
before the foreach statement. A simple field is projected
as-is and doesn't have additional expressions.
Projection refers to the subset of columns that your query returns. In SAQL, projection
occurs in the foreach statement, where the query
performs an operation on each row in the dataset.
Example: Filter
In this query, the filter statement occurs
post-projection.
q = load "Superstore";
q = foreach q generate 'Category' as 'Store_Category', 'Sub_Category' as 'Store_Sub_Category';
q = filter q by 'Store_Category'=="Furniture";
Here, we move the filter to pre-projection. Because the Category field occurs before the foreach statement, it doesn't have an alias.
q = load "Superstore";
q = filter q by 'Category'=="Furniture";
q = foreach q generate 'Category' as 'Store_Category', 'Sub_Category' as 'Store_Sub_Category';
Example: Group
In this example, the query first creates two new fields: Detailed_Category, a combination of the Category and Sub_Category fields, and Adjusted_Discount. After grouping the results by Detailed_Category, the second foreach statement takes the average of Adjusted_Discount for each Detailed_Category.
q = load "Superstore";
q = foreach q generate 'Category'+ "-" + 'Sub_Category' as 'Detailed_Category', 2*'Discount' as 'Adjusted_Discount';
q = group q by 'Detailed_Category';
q = foreach q generate 'Detailed_Category', avg('Adjusted_Discount') as 'Avg_Adjusted_Discount';
Instead of using two foreach statements, group by Sub_Category pre-projection, and add its alias in the foreach statement.
q = load "Superstore";
q = group q by ('Category', 'Sub_Category');
q = foreach q generate 'Category'+ "-" + 'Sub_Category' as 'Detailed_Category', 2*avg('Discount') as 'Avg_Adjusted_Discount';