How to Use the Amazon S3 Storage Connector in Data Cloud

The Amazon S3 Storage Connector enables Salesforce Data Cloud to read comma-separated values (.csv) and parquet files from your Amazon S3 buckets. The data is retrieved in a batch job that you can schedule to run as often as hourly or as infrequently as monthly. You also have the flexibility to change the schedule as necessary, so don’t fret if you want something to run more or less frequently after you’ve already chosen the schedule.

In this blog post, you’ll learn how to use the Amazon S3 Storage Connector in Data Cloud to ingest data from an Amazon S3 bucket.

What is Amazon S3?

Amazon S3, also referred to as just S3, is object storage that enables you to retrieve your data from anywhere on the web. Amazon S3 stores data in a flat structure using unique identifiers to look up objects when requested. In Amazon S3, objects are stored in containers called buckets. When you create a bucket, you must choose (at the very minimum) two things: the bucket name and the AWS region that you want the bucket to reside in.

Amazon S3 use cases

Amazon S3 can be used for a wide variety of use cases. Here are some of the most common:

Backup and storage: Amazon S3 is a natural place to back up files because it is highly redundant
Media hosting: Because you can store unlimited objects, and each individual object can be up to 5 TB, Amazon S3 is an ideal location to host videos, photos, or music uploads
Software delivery: You can use Amazon S3 to host your software applications for customers to download
Data lakes: Amazon S3 is an optimal foundation for hosting a data lake because of its scalability
Static websites: You can configure your bucket to host a static website of HTML, CSS, and client-side scripts
Files: Because of its optimal scaling, support for large files, and the fact that you access any object over the web at any time, S3 is an ideal place to store files

Amazon S3 Storage Connector

The Amazon S3 Storage Connector is available by default in all Salesforece Data Cloud orgs. Nothing needs to be done to enable it to be used in a data stream; all you have to do is click New under Data Streams and you will see the connector.

Screenshot of the Data Streams page showing the connectors that are available with Amazon S3 Storage Connector selected.

To set up a new connector from Data Cloud to Amazon S3, there are a few things that you’ll need to configure, including: bucket name, access key, secret key, the type of file, its directory, file name, and the file source. Don’t worry, we’ll show you how to configure all of this!

Screenshot of new Data Stream page for Amazon S3 with fields

Creating a Bucket in Amazon S3

Let’s dive in — we’ll create an example connection between Amazon S3 and Data Cloud by taking the following steps.

To get started, we first pull up S3 in Amazon Web Services using the global search functionality. Then, we click Create bucket to create a new Amazon S3 bucket.

Screenshot of Amazon S3 landing page showing how to create a new bucket

Next, we enter a name for the bucket and choose our AWS region. We can also set the security and other settings on this page. For the purpose of this demonstration, we’ll configure the minimum criteria, which is the bucket name and region.

Screenshot of Bucket name and AWS Region fields on the Create bucket page

In our bucket, we will need to create a folder to serve as our directory in Data Cloud.

Screenshot of folder creation page with Create folder selected.

We then give our folder a name and create the folder.

Screenshot of text entered into the folder name field

After we have created our folder, we can then upload files to be ingested in Data Cloud. Note that Data Cloud can ingest files uploaded to Amazon S3 that are in .csv or parquet format.

Creating the integration user and generating the API keys

We then navigate to the Identity and Access Management (IAM) console in AWS and create a user that we’ll use for the integration between Amazon S3 and Data Cloud.

Screenshot of User creation page with Create user selected

We enter the details for the user and follow the prompts to add additional optional configurations as necessary. Please note: no additional configurations were added for this demonstration.

Screenshot of User name field with user name value entered into the text box

We then add the necessary S3 permissions to our integration user. We can add existing S3 policies manually, or create an in-line policy and copy the JSON (see documentation).

Screenshot of Permission policies page with Add permissions picklist and values

Now, we are ready to generate the secret and access keys that we’ll need to use in Data Cloud. We click Create access key to generate the keys that are needed.

Screenshot of Access key page with Create access key highlighted

We then need to designate to AWS what our use case is for generating these keys. For our example use case, we choose “Other.”

Screenshot of Use case selections with Other selected.

Next, we create a tag for our access key.

Screenshot of Description tag value field with the value entered in the textbox

Our secret access key (secret in Data Cloud) and access key will appear. Please note: You may need to click Show to unhide your secret access key. Be sure to follow AWS’s best practices for securing your access keys.

Screenshot of Access key and Secret access key.

Then, we enter all of the information necessary to configure the Amazon S3 Connector in Data Cloud. You can enter in the file name, a wildcard (*), or wildcard (*).csv as shown below.

And finally, we create and configure our data lake object that will store the data from our Amazon S3 data stream.

Screenshot of Data Lake Object field with a blank value

Closing words

Hopefully, this blog post was helpful and informative. Going forward, we’ll be publishing many more blog posts on how to use AWS with Salesforce Data Cloud. If you want to take it a step further and learn how to map your data lake object to data model objects in Data Cloud, please watch our YouTube video, where we show you how to map your ingested data in your data lake object to data model objects in Data Cloud. Also, we encourage you to take these two trails on Trailhead, Learn AWS Cloud Practioner Essentials and AWS Cloud for Technical Professionals, to strengthen your AWS knowledge. Now let’s get streaming!

Resources

Learn more about Salesforce Data Cloud
Documentation: AWS Identity and Access Management
Documentation: Bucket Policies and User Policies
Documentation: Policies and Permissions in S3
Video: Mapping Data Streams
Trailhead: Learn AWS Cloud Practioner Essentials
Trailhead: AWS Cloud for Technical Professionals

About the author

Danielle

Danielle Larregui is a Senior Developer Advocate at Salesforce focusing on the Data Cloud platform. She enjoys learning about cloud technologies, speaking at and attending tech conferences, and engaging with technical communities. You can follow her on X(Twitter).