Newer Version Available
fill
Syntax
1results = fill resultSet by (dateCols=(dateField1, dateField2, "<date format>"), startDate=startDate, endDate=endDate, [partition])| Name | Description |
|---|---|
| resultSet | Required. The results of a query that serve as input to the fill() function. |
| dateCols | Required. date_fields—The date fields in which to check for gaps. The date format string accepts these values.
startDate—The starting date value beyond the scope of your data's date range. endDate—The ending date value beyond the scope of your data's date range.
|
| partition | Optional. A field used to split query results into smaller partitions. The fill() function resets when the field value changes. After each group of rows is completed for a given partition, fill() runs on the next partition. |
Example
This example uses fill() to add missing quarter and year values to tourist data.
1q = load "TouristsData";
2q = foreach q generate date_Year, date_Quarter, tourists;
3q = fill q by (dateCols=(date_Year, date_Quarter, "Y-Q"));
4q = limit q 15;The query first returns the year, quarter, and number of tourists for each quarter. Based on the results from the first three years represented in the dataset, the only date data available is for the first quarter.
These are the results from q = load "TouristsData"; q = foreach q generate date_Year, date_Quarter, tourists;.
| year | quarter | tourists |
|---|---|---|
| 2001 | 1 | 4127 |
| 2002 | 1 | 4173 |
| 2003 | 1 | 4621 |
fill() specifies in the date_cols array to group the input data by the quarter and year. To have a complete dataset of years and quarters, fill() adds the 2nd, 3rd, and 4th quarters for each year and a null value for the number of tourists.
| year | quarter | tourists |
|---|---|---|
| 2001 | 1 | 4127 |
| 2001 | 2 | - |
| 2001 | 3 | - |
| 2001 | 4 | - |
| 2002 | 1 | 4173 |
| 2002 | 2 | - |
| 2002 | 3 | - |
| 2002 | 4 | - |
| 2003 | 1 | 4621 |
| 2003 | 2 | - |
| 2003 | 3 | - |
| 2003 | 4 | - |
Example with Extended Date Range
This query returns null values for tourists where date_Month and date_Year come before or after the date values in the dataset or there are gaps within the data provided.
1q = load "TouristsData";
2q = foreach q generate date_Year, date_Month, tourists;
3q = fill q by (dateCols=(date_Year, date_Month, "Y-M"), startDate="2000-10", endDate="2001-07");
4q = limit q 10;| date_Month | date_Year | tourists |
|---|---|---|
| 10 | 2000 | - |
| 11 | 2000 | - |
| 12 | 2000 | - |
| 01 | 2001 | 41,735 |
| 02 | 2001 | - |
| 03 | 2001 | - |
| 04 | 2001 | 26,665 |
| 05 | 2001 | - |
| 06 | 2001 | - |
| 07 | 2001 | - |