Set Up a Google BigQuery Data Federation Connection
Set up a connection between Google BigQuery and Data 360 to access data.
| User Permissions Needed | |
|---|---|
| To create a Google BigQuery Data Federation connection in Data Cloud: | Data Cloud Architect |
Requirements
Before you configure the connection, review these network and service account requirements.
- If you use Virtual Private Cloud (VPC) Service Controls in Google Cloud, ask your Google Cloud Platform (GCP) org admin to review Data 360 IP Allowlist and Allow access to protected resources from outside a perimeter, and update your allowlists.
- Create a Google Cloud service account with these permissions. See Create service accounts.
- BigQuery Metadata Viewer
- BigQuery Job User
- BigQuery Data Viewer
- BigQuery Read Session User
Set Up the Connection
-
In Data Cloud, go to Data Cloud Setup.
-
Under Configuration, select More Connectors.
-
Click New.
-
Under Source, select Google BigQuery and click Next.
-
Enter a connection name, connection API name.
The connection name must be at least three characters long.
-
If you want to use key-based authentication, click Private Key Pair and upload the Private Key JSON file or paste its contents in the SSH Private Key field.
If you’re using key-based authentication, skip the next step and continue configuring your BigQuery connection.
-
If you want to use identity provider–based authentication, click Identity Provider Based.
-
In the Google Cloud Service (GCS) account data warehouse, select IAM & Admin > Workload Identity Federation.
-
In the Workload Identity Pool, click Add provider.
-
Under details, enter a name for the provider.
-
For Issuer (URL), enter your org's My Domain URL followed by /services/connectors. For example, https://yourcompany.my.salesforce.com/services/connectors.
-
Under Audiences, select Allowed Audience and enter your org's MyDomain URL. For example, https://yourcompany.my.salesforce.com.
-
Under Attribute Mappings, match the audience attribute to OIDC audience (attribute.audi = assertion.aud) and the subject attribute to OIDC subject (google.subject = assertion.sub).
-
Click Save.
-
-
To provide audience access to the workload pool that you created, click Grant access and then select Grant access using service account impersonation.
-
Select your service account email.
-
Under Select principals, select the audience attribute that you created, and enter your org's MyDomain URL.
-
Click Save.
-
-
To provide access to subject, click Grant access and then select Grant access using service account impersonation.
-
From the Data Cloud connection window, copy the external ID.
-
In GCS, select your service account.
-
Under Select principals, select the subject attribute and enter the Data Cloud external ID.
-
Click Save.
-
-
Download the config file generated by GCS.
-
In the Configure your application dialog box, select the provider that you created.
-
Enter a name for the OIDC token path and click Download config. You can choose any name as it won't be needed in the Data Cloud setup window.
-
Copy the contents of the downloaded config file and paste it in the OIDC Client Config field in Data Cloud.
-
-
-
Enter the Project ID.
-
Enter the Service Account email.
-
To review your configuration, click Test Connection.
-
Click Save.
Considerations for Large Tables
Large Result Set and Unload are two independent optimizations that can be used to enhance the performance of queries on large tables. Unload is more scalable than Large Result Set. Use Unload if you are able to enable acceleration.
-
Large Result Set (LRS): When LRS is enabled and the size in GB of the data to be returned by a query is significant, Google writes the data to a Google-managed, internal, temporary, staging location which Data 360 queries transparently. To enable LRS:
-
Grant the
bigquery.tables.createandbigquery.dataset.createpermissions in BigQuery to your service account. Associate with your service account either a predefined IAM role that already has these permissions or a custom IAM role with these permissions. See BigQuery predefined IAM roles. -
In your BigQuery project, create a dataset named
sf_temp_datasetand set its expiration time to six hours. -
Select the Large Result Set checkbox in the connection configuration screen.
-
-
Unload: Unload is only applicable to accelerated data streams and is suitable for very data-intensive queries. When Unload is enabled and a refresh of an accelerated data stream is triggered, BigQuery executes the SQL query that Data 360 sends and exports the result to a temporary Google Cloud Storage (GCS) bucket that Data 360 queries using the GCS API. The result of the query is cached in Data 360. Unload is a connection-level property; acceleration is a data stream-level property.
-
Create a new GCS bucket, and associate the Storage Object User IAM role with your service account.
-
To reduce storage costs, create a bucket lifecycle policy that deletes exported files older than one day. See Object Lifecycle Management | Cloud Storage.
-
Toggle the Use Unload button to active.
-
In the GCS Bucket Absolute Path field, enter the absolute path of the GCS bucket followed by a forward slash. For example,
gs://[BUCKET_NAME]/.
-