Create an Unstructured Data Connection from Google Cloud Storage
Connect unstructured data from Google Cloud Storage (GCS) for use in your Agentforce, RAG, automation, and analytics workflows. First create an unstructured data lake object (UDLO) in Data Cloud to reference the unstructured data. Then create a file notification pipeline to keep your UDLO up to date.
See the Search Index Reference for a list of support file formats for unstructured data.
To ensure your unstructured data is properly connected to Data Cloud, first perform the steps to connect unstructured data from your external blob store, then set up file notifications, and finally put data in your external blob store.
Create a UDLO in Data Cloud to reference unstructured data from GCS.
User Permissions Needed | |
---|---|
To connect unstructured data from an external blob store: | One of these permission sets:
|
Before you begin:
Make sure you've set up a connection to GCS.
- From App Launcher, select Data Cloud.
- Click Data Lake Objects and then click New.
- From the New Data Lake Object menu, select From External Files, and click Next.
- Choose the Google Storage connector, and click Next.
- From the Select Connection dropdown list, select a connection. Data Cloud auto-populates the source based on the connection that you select.
- In the Directory field, point to a specific folder or an entire directory in your blob store. All folders and subfolders in a directory are included. Optionally, use wildcard characters to specify a file name pattern for multiple files.
- To add more directories, click More Files. You can include up to 5 directories.
- Click Next.
- Add an Object Name and an Object API Name for the UDLO. See Data Lake Object Naming Standards.
- From the Data Space dropdown list, select a data space in which to create the new UDMO or a data space from which to select an existing UDMO.
- Map the UDLO to a UDMO.
- To create a new UDMO, click New.
- To use an existing UDMO, click Existing, and select a UDMO from the list.
- Optionally, leave the checkbox selected to create a search index configuration for the UDMO using system defaults that automatically selects text fields and a chunking strategy for each field. You can deselect the checkbox and create a search index configuration later if you choose not to do so now.
- Click Next, or if you created a search index configuration, review the details, and save your work.
- After establishing the connection from your external blob store, set up a file notification pipeline to notify Data Cloud whenever files are added, updated, or deleted from your external blob store.
Create a file notification pipeline for Google Cloud Platform (GCP) Storage to notify a Salesforce connected app whenever unstructured data files are added, updated, or deleted from a bucket.
Required User Permissions
User Permissions Needed | |
---|---|
GCP |
|
Cloud Storage bucket | - |
gcloud CLI | - |
GCP Secret Manager | Either the Secret Manager Admin role or these permissions:
|
To configure OAuth for the connected app you will use in your file notification pipeline, create a private/public RSA key pair and a digital x509 certificate.
If you already have a connected app that you want to use, ensure that you have the private/public RSA key pair you used to create the x509 certificate for that app, as you need them to enable OAuth in a subsequent step.
-
From your terminal, change directories to any folder.
-
Create the private/public key pair.
-
Create a digital certificate from the key pair.
-
Complete the questions as prompted.
-
Create a pkcs8 private key from the key pair.
-
Keep the private/public key pair and the digital x509 certificate, as you need them in the following task.
- Download the GCS file notification installer script.
- Download and unzip gcp_cloud_function.zip.
Set up a connected app to use in the file notification pipeline and apply necessary OAuth settings.
- Create a basic connected app. If you already have a connected app, continue to continue to Enable OAuth settings for the API integration.
- Enable OAuth settings for the API integration.
- To use JWT OAuth Flow, select Use Digital Signatures.
- Enter the callback URL (endpoint) that Salesforce calls back to your application during OAuth. It’s the same as the OAuth redirect URI.
- Click Choose File, and select the certificate (.crt file) you created in Create a Private/Public Key Pair and Certificate.
- Set these OAuth scopes: Manage user data via APIs (api), Perform requests at any time (refresh_token, offline_access), and Manage Data Cloud Ingestion API data (cdp_ingest_api).
- Note the Consumer Key and Callback URL.
- Click Save.
- In Setup, search for OAuth and OpenID Connect Settings.
- Turn on Allow OAuth Username-Password Flows.
- Using the Consumer Key and Callback URL created in Enable OAuth settings for the API integration. Paste the following URL into your browser using the noted values.
- When prompted to provide permission for each of the scopes you requested, click Allow.
- (Optional): If you want to verify that the app is correctly authorized, use the Quick Find search for “Connected Apps OAuth Usage”. Your connected app should be listed.
The install script is supported on macOS and Linux operating systems.
- In the directory where you extracted the file notification installer script, go to the
installers/gcs
folder. - Open
input_parameters_gcs.conf
and replace the environment variable values with your values. - Run the installer script. This command makes the file notification script executable.
- macOS users: When you run the installer script, you are transferred to the UI to authenticate. After authentication, return to the terminal. The script continues to run in the terminal.
- Linux users: When you run the installer script, an authentication link and code is displayed in the terminal. Copy the link and code to authenticate from your browser.
Refer to these variables when creating a file notification pipeline from a GCS blob store to Data Cloud.
Variable | Definition |
---|---|
PROJECT_ID | Your GCS project ID |
GCS_REGION | Contains the region of your GCS bucket. (If multi-region, select any region available in that location.) |
SF_USERNAME | Your Salesforce org username |
SF_LOGIN_URL | Your Salesforce org login URL |
LOCATION | Your GCS bucket region |
SOURCE_CODE_BUCKET_NAME | Contains the blob store bucket name where you will set up file notifications |
SOURCE_CODE_LOCAL_PATH | The local path for your function's source code |
TRIGGER_REGION | Your trigger region |
CONSUMER_KEY_NAME | Contains the name of the secret you create in Secret Manager when you upload the Consumer Key created in your connected app. |
CONSUMER_KEY_VALUE | The Consumer Key created in your connected app |
RSA_PRIVATE_KEY | Contains the name of the secret you create in Secret Manager when you upload the RSA private key (.pem file) |
PEM_FILE_PATH | Contains the complete path to the keypair.pem file on your local machine. For example: /Users/Name/Documents/keypair.pem |
CALLBACK_URL | Contains the callback URL (endpoint) that Salesforce calls back to your application during OAuth |
These common issues can occur when setting up a connected app or GCS file notification in Data Cloud. If you don’t see data connected in your org after loading unstructured data into GCS and creating a file notification pipeline, review these requirements.
Connected App
Ensure that you have:
- Used a valid consumer key and RSA private key. If you're using a consumer key from a previously configured connected app, check that the connected app and Salesforce org still exists.
- Set all three required scopes in the connected app, as documented.
- Authenticated your connected app. In the Quick Find, search for "Connected Apps OAuth Usage". You should see your connected app and a user count of 1.
- Uploaded a
.crt
file in the connected app, not a.pem
or.key
file. - Used a
.pem
file when creating the RSA private key. - Used a single set of keys (
.crt
,.pem
, and.key
) for both the cloud function and the connected app. - Used the correct pattern for key names, as specified in the variable reference.
Unstructured Data Files and UDLO Creation
Ensure that you have:
- Aligned the directory structure of the external blob store source directory and the UDLO directory. For example, if the external blob store parent directory is named
data
, and you configure the UDLO directory field asfiles
, Data Cloud looks for files at this path:<Cloud Provider Bucket>/data/files
. - Correctly set the file type extensions and have matched the case used in the file names for the unstructured data files when creating the UDLO.
Installer Script and Cloud Function
Ensure that you have:
- Properly set all input parameters.
- Provided a valid Salesforce Org login url and username.
- Uploaded the correct, latest cloud function zip for your notification pipeline. For example, don't use the GCS cloud function zip for an Azure notification pipeline.
- Set up access to the GCS project with secret manager credentials.
Section | Issue | Reason | Solution |
---|---|---|---|
Connected App | 400 error in cloud function logs | The certificate (.crt ) file used in the connected app is not associated with the RSA private key file (.pem ) uploaded to the cloud provider. | Recreate the keys and replace the existing key (.pem) file uploaded to the cloud provider with the certificate file used in the connected app. |
Connected App | 403 error in cloud function logs | The connected app Org URL does not properly execute or is not available. | Ensure OAuth scopes are set properly and that the Org URL is reachable from a browser. |
Connected App | 400 error in cloud function logs | The connected app policy settings are incorrect. | Edit the policies in the connected app: Set IP configuration to Relax IP restrictions and enable OAuth username and password. |
Data explorer | Files uploaded to the cloud bucket/container are not refreshed in UDLO/UDMO | Incorrect file types were uploaded, for example, a UDLO is set to accept PDF files but HTML files were uploaded. | Upload the correct file types, or create new UDLOs for the file types and upload the files again. |
GCS console | Could not deserialize the data format | The consumer key and private keys were not added properly. | Ensure you rename keypair.key to keypair.pem and upload it as a secret for RSA PRIVATE KEY and that you add the consumer key properly. |
GCS installer | Trigger location us-east1 not matching location us-east1 | For a multi-region bucket, trigger-location was added as us-east1 , or GCS_REGION was added as "us". | Add the trigger location as "us" for a multi-region bucket, and add GCS_REGION as any valid location, such as us-east1 . You can pass --trigger-location=us if the command fails. |
GCS installer | Deployed cloud function missing | You reused a name from existing cloud function. | Use unique cloud function names. |
GCS installer | Can't add or set IAM policy settings for CONSUMER_KEY and RSA_PRIVATE_KEY | The necessary secret manager permission is missing from your account for the respective GCP project. | Add the necessary secret manager permissions to your account for your GCP project. |
GCS installer | Not able to generate JWT token in cloud function logs or cloud event not reaching beacon service | You provided an invalid username, an invalid Salesforce URL, or an invalid consumer key. | Ensure your username is valid, the Salesforce URL is accessible, and you include the same consumer key in your secret and in your connected app. |
GCS installer | Not able to generate CDP access token | The name or pattern of the secret keys does not match CONSUMER_KEY_<YOUR_OWN_SUFFIX> and RSA_PRIVATE_KEY_<YOUR_OWN_SUFFIX> . | Ensure your secret keys use this pattern: CONSUMER_KEY_<YOUR_OWN_SUFFIX> and RSA_PRIVATE_KEY_<YOUR_OWN_SUFFIX> . |
See Also