Newer Version Available
Combine Data from Multiple Datasets with cogroup
Cogroup is similar to relational database joins, but with some important differences. Unlike a relational database join, in a cogroup the datasets are grouped first, and then the groups are joined.
This example combines meeting information with account information.
Suppose that you want to understand how much time your reps spend meeting with each account. Is there a relationship between spending more time and winning an account? Are some reps spending much more or much less time than average? The first step in answering your questions is to combine meeting data with account data using cogroup.
Suppose that you have a dataset of meeting information from the Salesforce Event object. In this example, your reps have had six meetings with four different companies. The Meetings dataset has a MeetingDuration column, which contains the meeting duration in hours.

The account data exists in the Salesforce Opportunity object. The Ops dataset has an Account, Won, and Amount column. The Amount column contains the dollar value of the opportunity, in millions.

To see the effect of meeting duration on opportunities, you start by combining these two datasets into a single data stream using cogroup.
1q = cogroup ops by 'Account', meetings by 'Company';Internally (you cannot see these results yet), the resulting cogrouped data stream contains the following data. Note how the data streams are rolled up on one or more dimensions.
1(1,{(Shoes2Go,2,), (Shoes2Go,5)},{(Shoes2Go,1,1.5), (Shoes2Go,0,3})
2
3(2,{(FreshMeals,3), (FreshMeals, 5)},{(FreshMeals,1,2) (FreshMeals, 1, 1.4)})
4
5(3,{(ZipBikeShare,4)},{(ZipBikeShare,1, 1.1)})
6
7(4,{(ZenRetreats,6)},{(ZenRetreats,0, 2)})Now the datasets are combined. To see the data, you create a projection using foreach. The complete SAQL statement is shown below.
1ops = load "Ops";
2meetings = load "Meetings";
3q = cogroup ops by 'Account', meetings by 'Company';
4q = foreach q generate ops.'Account' as 'Account', sum(ops.'Amount') as 'sum_Amount', sum(meetings.'MeetingDuration') as 'TimeSpent';The resulting data stream contains the sum of amount and total meeting time for each company. The sum of amount is the sum of the dollar value for every opportunity for the company.

Now that you have combined the data into a single data stream, you can analyze the effects that total meeting time has on your opportunities.