Create an Unstructured Data Connection from Amazon S3

Connect unstructured data from Amazon S3 for use in your Agentforce, RAG, automation, and analytics workflows. First create an unstructured data lake object (UDLO) in Data Cloud to reference the unstructured data. Then create a file notification pipeline to keep your UDLO up to date.

See the Search Index Reference for a list of support file formats for unstructured data.

To ensure your unstructured data is properly connected to Data Cloud, first perform the steps to connect unstructured data from your external blob store, then set up file notifications, and finally put data in your external blob store.

Create a UDLO in Data Cloud to reference unstructured data from Amazon S3.

User Permissions Needed
To connect unstructured data from an external blob store:One of these permission sets:
  • Data Cloud Admin
  • Data Cloud Marketing Admin
  • Data Cloud Data Aware Specialist

Before you begin:

Make sure you've set up a connection to Amazon S3.

  1. From App Launcher, select Data Cloud.
  2. Click Data Lake Objects and then click New.
  3. From the New Data Lake Object menu, select From External Files, and click Next.
  4. Choose the Amazon S3 connector, and click Next.
  5. From the Select Connection dropdown list, select a connection. Data Cloud auto-populates the source based on the connection that you select.
  6. In the Directory field, point to a specific folder or an entire directory in your blob store. All folders and subfolders in a directory are included. Optionally, use wildcard characters to specify a file name pattern for multiple files.
  7. To add more directories, click More Files. You can include up to 5 directories.
  8. Click Next.
  9. Add an Object Name and an Object API Name for the UDLO. See Data Lake Object Naming Standards.
  10. From the Data Space dropdown list, select a data space in which to create the new UDMO or a data space from which to select an existing UDMO.
  11. Map the UDLO to a UDMO.
    • To create a new UDMO, click New.
    • To use an existing UDMO, click Existing, and select a UDMO from the list.
  12. Optionally, leave the checkbox selected to create a search index configuration for the UDMO using system defaults that automatically selects text fields and a chunking strategy for each field. You can deselect the checkbox and create a search index configuration later if you choose not to do so now.
  13. Click Next, or if you created a search index configuration, review the details, and save your work.
  14. After establishing the connection from your external blob store, set up a file notification pipeline to notify Data Cloud whenever files are added, updated, or deleted from your external blob store.

Create a file notification pipeline for Amazon S3 to notify a Salesforce connected app whenever unstructured data files are added, updated, or deleted from a bucket.

Required User Permissions

User Permissions Needed
AWS IAMAWS IAM iam:AttachRolePolicy
Amazon S3 bucketCreate one bucket for your unstructured data and one bucket to store your AWS Lambda function source code
AWS CLI-
AWS Lambda
  • iam:CreateRole
  • lambda:CreateFunction
  • lambda:InvokeFunction
AWS Secrets Manager
  • secretsmanager:CreateSecret
  • secretsmanager:PutResourcePolicy
jq-

To configure OAuth for the connected app you will use in your file notification pipeline, create a private/public RSA key pair and a digital x509 certificate.

If you already have a connected app that you want to use, ensure that you have the private/public RSA key pair you used to create the x509 certificate for that app, as you need them to enable OAuth in a subsequent step.

  1. From your terminal, change directories to any folder.

  2. Create the private/public key pair.

  3. Create a digital certificate from the key pair.

  4. Complete the questions as prompted.

  5. Create a pkcs8 private key from the key pair.

  6. Keep the private/public key pair and the digital x509 certificate, as you need them in the following task.

  1. Download the S3 file notification installer script.
  2. Download and unzip aws_lambda_function.zip.

Set up a connected app to use in the file notification pipeline and apply necessary OAuth settings.

  1. Create a basic connected app. If you already have a connected app, continue to continue to Enable OAuth settings for the API integration.
  2. Enable OAuth settings for the API integration.
    1. To use JWT OAuth Flow, select Use Digital Signatures.
    2. Enter the callback URL (endpoint) that Salesforce calls back to your application during OAuth. It’s the same as the OAuth redirect URI.
    3. Click Choose File, and select the certificate (.crt file) you created in Create a Private/Public Key Pair and Certificate.
    4. Set these OAuth scopes: Manage user data via APIs (api), Perform requests at any time (refresh_token, offline_access), and Manage Data Cloud Ingestion API data (cdp_ingest_api).
    5. Note the Consumer Key and Callback URL.
  3. Click Save.
  4. In Setup, search for OAuth and OpenID Connect Settings.
  5. Turn on Allow OAuth Username-Password Flows.
  6. Using the Consumer Key and Callback URL created in Enable OAuth settings for the API integration. Paste the following URL into your browser using the noted values.
  7. When prompted to provide permission for each of the scopes you requested, click Allow.
  8. (Optional): If you want to verify that the app is correctly authorized, use the Quick Find search for “Connected Apps OAuth Usage”. Your connected app should be listed.

The install script is supported on macOS and Linux operating systems.

  1. Download and install jq.

  2. Download and install the AWS CLI.

  3. From your terminal, enter your AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN) and region.

  4. In the directory where you extracted the file notification installer script, go to the installers/aws folder.

  5. Open input_parameters_s3.conf and replace the environment variable values with your values.

  6. Run the installer script. This command makes the file notification script executable.

  • macOS users: When you run the installer script, you are transferred to the UI to authenticate. After authentication, return to the terminal. The script continues to run in the terminal.
  • Linux users: When you run the installer script, an authentication link and code is displayed in the terminal. Copy the link and code to authenticate from your browser.

Refer to these variables when creating a file notification pipeline from Amazon S3 to Data Cloud.

VariableDefinition
SF_USERNAMEYour Salesforce org username
SF_LOGIN_URLYour Salesforce org login URL
AWS_ACCOUNT_IDContains your AWS Account Id.
REGIONYour AWS region
EVENT_S3_SOURCE_BUCKETThe S3 bucket that contains your unstructured data source
EVENT_S3_SOURCE_KEYThe data folder within your EVENT_S3_SOURCE_BUCKET
LAMBDA_FUNCTION_S3_BUCKETContains the name of the S3 bucket that contains the source code ZIP file to run your Lambda function.
LAMBDA_FUNC_LOC_S3_KEYContains the S3 key for the Lambda function inside the S3 bucket
SOURCE_CODE_LOCAL_PATHThe local path for your function's source code
LAMBDA_ROLEThe name of the execution role to run the Lambda function. This can be any string.
LAMBDA_FUNCTION_NAMEThe name of the Lambda function to be deployed in your AWS account. This can be any string.
RSA_PRIVATE_KEYContains the name of the secret you create in AWS Secret Manager when you upload the RSA private key (PEM file). Note: Do not put the contents of keypair.pem file into this variable.
CONSUMER_KEY_NAMEContains the name of the secret you create in AWS Secret Manager when you upload the Consumer Key created in your connected app
CONSUMER_KEY_VALUEThe Consumer Key created in your connected app
PEM_FILE_PATHContains the complete path to the keypair.pem file on your local machine. For example: /Users/Name/Documents/keypair.pem
CALLBACK_URLContains the callback URL (endpoint) that Salesforce calls back to your application during OAuth.
AWS_ACCESS_KEY_IDContains the access key associated with your IAM user or role
AWS_SECRET_KEYContains the credentials needed to connect to your access key
AWS_SECURITY_TOKEN or AWS_SESSION_TOKENContains the credentials needed to sign API requests to AWS

These common issues can occur when setting up a connected app or S3 file notification in Data Cloud. If you don’t see data connected in your org after loading unstructured data into S3 and creating a file notification pipeline, review these requirements.

Connected App

Ensure that you have:

  • Used a valid consumer key and RSA private key. If you're using a consumer key from a previously configured connected app, check that the connected app and Salesforce org still exists.
  • Set all three required scopes in the connected app, as documented.
  • Authenticated your connected app. In the Quick Find, search for "Connected Apps OAuth Usage". You should see your connected app and a user count of 1.
  • Uploaded a .crt file in the connected app, not a .pem or .key file.
  • Used a .pem file when creating the RSA private key.
  • Used a single set of keys (.crt, .pem, and .key) for both the cloud function and the connected app.
  • Used the correct pattern for key names, as specified in the variable reference.

Unstructured Data Files and UDLO Creation

Ensure that you have:

  • Aligned the directory structure of the external blob store source directory and the UDLO directory. For example, if the external blob store parent directory is named data, and you configure the UDLO directory field as files, Data Cloud looks for files at this path: <Cloud Provider Bucket>/data/files.
  • Correctly set the file type extensions and have matched the case used in the file names for the unstructured data files when creating the UDLO.

Installer Script and Cloud Function

Ensure that you have:

  • Properly set all input parameters.
  • Provided a valid Salesforce Org login url and username.
  • Uploaded the correct, latest cloud function zip for your notification pipeline. For example, don't use the GCS cloud function zip for an Azure notification pipeline.
  • Used valid and un-expired AWS JIT credentials for the admin access role (AWS_SESSION_KEY, AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY).

Additional Troubleshooting

SectionIssueReasonSolution
Connected App400 error in cloud function logsThe certificate (.crt) file used in the connected app is not associated with the RSA private key file (.pem) uploaded to the cloud provider.Recreate the keys and replace the existing key (.pem) file uploaded to the cloud provider with the certificate file used in the connected app.
Connected App403 error in cloud function logsThe connected app Org URL does not properly execute or is not available.Ensure OAuth scopes are set properly and that the Org URL is reachable from a browser.
Connected App400 error in cloud function logsThe connected app policy settings are incorrect.Edit the policies in the connected app: Set IP configuration to Relax IP restrictions and enable OAuth username and password.
Data explorerFiles uploaded to the cloud bucket/container are not refreshed in UDLO/UDMOIncorrect file types were uploaded, for example, a UDLO is set to accept PDF files but HTML files were uploaded.Upload the correct file types, or create new UDLOs for the file types and upload the files again.

See Also