Add the #DF24 Developer Keynote to your agenda. Join us in-person on 9/18 at 2:30 p.m. PT or on Salesforce+ at 5 p.m. PT for the must-see session built just for developers.

Salesforce Data Cloud offers prebuilt connectors that allow you to configure data to flow into or out of Data Cloud through third-party integrations. If you are using Amazon Kinesis Data Streams to collect and process data, you’re in luck. You can now use the Amazon Kinesis Connector to ingest that data into Data Cloud.

In this blog post, we’ll walk through how to use the Amazon Kinesis Connector in Data Cloud to read data from an Amazon Kinesis Data Stream. For more information on the detailed steps to create an Amazon Kinesis Data Streams producer, follow the steps in Amazon’s Developer Guide.

Common use cases of Kinesis Data Streams

Amazon Kinesis Data Streams can be used to collect and process large streams of data records in real time. Producers continually push data to Kinesis Data Streams, and consumers regularly check and process the emitted data. Developers use Kinesis Data Streams for a wide variety of use cases, including:

  • Accelerated log and data feed intake and processing: Push system and application logs, which are then available for processing in seconds
  • Analyze IoT device data: Use Amazon Kinesis services to process streaming data from IoT devices, such as consumer appliances, embedded sensors, and TV set-top boxes, and then take action in real time
  • Real-time metrics and reporting: Use data collected into Kinesis Data Streams for simple data analysis and reporting
  • Real-time data analytics: Process website clickstreams in real time, then analyze site usability engagement using different Kinesis Data Streams applications running in parallel
  • Complex stream processing: You can create directed acyclic graphs (DAGs) of Kinesis Data Streams applications and data streams

About the Amazon Kinesis Connector

The Amazon Kinesis Connector gives you the ability to consume data from your Kinesis Data Streams, and once the configuration is set up, you’ll see a new connector available in Data Cloud.

Amazon Kinesis is now available as a data source

For the connector to be available for selection, you’ll need to create an Amazon Kinesis Data Stream, configure access to your AWS resources, and then connect to the Amazon Kinesis Data Stream from Data Cloud. Let’s look at the steps needed.

How to set up and use the Amazon Kinesis Connector

Amazon Kinesis Data Streams excels at data processing, and the Data Cloud connector makes leveraging this processing power easy. This section will unveil the setup process and considerations for a smooth data pipeline integration. Our example uses New York taxi trip data to simulate capturing a data stream of trips being taken by customers. As taxi data is put into the Amazon Kinesis Data Stream, we can make that data available in Data Cloud using the new connector. This enables us to use Data Cloud to look for insights and take action when needed.

Step 1: Create a Kinesis Data Stream

Sign in to the AWS Management Console and open the Kinesis console. Choose Kinesis Data Streams in the Get started pane, then choose Create data stream.

Amazon Kinesis services homepage in AWS Cloud

Enter a name for your data stream (such as taxi-bookings), select your required data stream capacity, and click Create data stream.

Step 2: Create an IAM policy and user

Next, locate the Amazon Resource Name (ARN) for the new data stream that you created in the step above. You can find this ARN listed as ARN at the top of the Details tab. The ARN format is as follows.

  • Region: This is the AWS region code (for example, us-east-1). For more information, see Region and Availability Zone Concepts.
  • Account: This is the AWS account ID, as shown in Account Settings.
  • Name: This is the name of the data stream that you created in the step above, which is taxi-bookings.

In the IAM console, in Policies, choose Create policy. Choose Kinesis as the AWS service. Select the permissions that meet your security policies. Fine-grained permissions can be found in the Amazon Kinesis Streams documentation.

Here’s an example policy document:

Choose Next. Change Policy Name to DataCloudKinesisConnectorPolicy, review the permissions, and choose Create Policy.

In the IAM console, on the Users page, choose Create user. For User name, enter DataCloudKinesisConnectorUser, then click Next.

Now, choose Attach policies directly, and search by name for the policy that you created in the procedure above (DataCloudKinesisConnectorPolicy). Select the box to the left of the policy name, and then choose Next.
Review the details and summary, and then choose Create user.

Then navigate to the user you created above (DataCloudKinesisConnectorUser) and click Create access key.

Retrieving user access keys in AWS

Copy the Access key ID, and save it somewhere safe. Under Secret access key, choose Show, and store that key securely also.

Step 3: Stream sample records into Kinesis Data Stream

Our taxi use case example uses New York taxi trip data to produce events to send to our new Kinesis Data Stream. You’re welcome to stream any data you like, but the structure of the data for Data Cloud needs to be defined as a schema file using the OpenAPI Specification.

The details to implement a producer aren’t included in this post, but you can refer to the Amazon documentation for performing basic Kinesis Data Stream operations using the AWS CLI to create your own.

Based on the New York taxi trip data, a schema can be defined to represent the data that we’ll send to the Kinesis Data Stream. The schema used to send that data is taxi-bookings.yaml. Save this file as we’ll use it when configuring the Data Cloud connector.

Once your producer has started to send data, you can check the progress by opening the Kinesis console and choosing the data stream you created (taxi-bookings). Click the Data Viewer sub-tab to view a sample of the data available.

Viewing the data in your Amazon Kinesis data stream

Step 4: Create an Amazon Kinesis Connector in Data Cloud

Now we have all the information we need to connect Data Cloud.

Sign in to your Data Cloud instance and make sure you have Data Cloud Admin or Data Cloud Marketing Admin user permissions.

Choose SetupData Cloud Setup. Then choose Connectors and click New. Select the Amazon Kinesis connector and click Next.

Selecting the Amazon Kinesis connector in Data Cloud

Enter the following information on the connectors page:

  • Connection Name: Kinesis Taxi Bookings
  • Connection API Name: Kinesis_Taxi_Bookings
  • AWS Access key: Use the AWS Access Key saved from creating your IAM User above
  • AWS Secret access key: Use the AWS Secret saved from creating your IAM User above
  • Name of the Kinesis stream: taxi-bookings
  • AWS region: Use your AWS region
  • Amazon Kinesis data stream service endpoint: https://kinesis.<region>.amazonaws.com (region is the AWS region code, for example us-east-1)

Next, click on Test Connection to verify there are no errors, then click Save. Then click on the connection we just created, and click Upload Files under the schema section.

Specifying the schema to be used in your Amazon Kinesis connector

Select the file representing the taxi bookings schema from earlier in this post, preview the schema, and click Save.

Step 5: Ingest streaming data from Amazon Kinesis in Data Cloud

Next, navigate to Data Cloud and click the Data Streams tab. Click New, select the Amazon Kinesis data source, then click Next.

Creating a Data Cloud Data Stream using the Amazon Kinesis connector

On the next screen, you can select the object we created in our taxi-bookings schema.

Selecting the object from the schema definition

Since the data we sent to Amazon Kinesis for taxi trips has no clear unique identifier, we’ll create one by clicking New Formula Field.

Creating a formula field to use as a unique identifier

Create a new field called id of data type Text that uses the UUID() function to create a unique 36-character number, then click Save.

Using a function to create a UUID

Next, change the Data Lake Object Label and API Name to Taxi Booking. Select Engagement as the category, then select tpep_pickup_datetime as the Event Time Field, and id as the Primary Key.

Specifying attributes for our Data Lake Object

Finally, click Deploy.

Your data stream will now retrieve new records from the connected Amazon source approximately every 15 minutes. Review the documentation to explore the guidelines and limits for APIs. As soon as your Last Run Status is Success, you can view the records ingested by looking at the data lake object in Data Cloud.

Using Data Explorer in Data Cloud, we can now explore the data lake object called Taxi Bookings that we created. This shows all the records ingested from the Amazon Kinesis Stream.

Using Data Explorer to view the data ingested from Amazon Kinesis

When working with the Amazon Kinesis connector, keep the following behaviors in mind:

  • The Amazon Kinesis connector can’t read nested JSON.
  • An Amazon Kinesis connector can have only a single object in its schema.
  • You can’t reuse an Amazon Kinesis source in multiple Amazon Kinesis connections in Data Cloud.
  • An Amazon Kinesis stream can contain only the type of data that’s selected when the data stream is created. Any variation in data that doesn’t match the schema causes the data stream to fail.

Conclusion

In this blog post, we covered how you can now consume data streams from Amazon Kinesis into Data Cloud. If you want to take it a step further and learn how to map your data lake object to data model objects in Data Cloud, please watch our Mapping Data Streams video, where we show you how to map your ingested data in your data lake object to data model objects in Data Cloud. Also, we encourage you to take two trails on Trailhead, Learn AWS Cloud Practioner Essentials and AWS Cloud for Technical Professionals, to strengthen your AWS knowledge.

Resources

About the author

Dave Norris is a Developer Advocate at Salesforce. He’s passionate about making technical subjects broadly accessible to a diverse audience. Dave has been with Salesforce for over a decade, has over 35 Salesforce and MuleSoft certifications, and became a Salesforce Certified Technical Architect in 2013.

Get the latest Salesforce Developer blog posts and podcast episodes via Slack or RSS.

Add to Slack Subscribe to RSS