Create an Amazon Kafka Data Stream

Create a data stream to start the flow of data from your Amazon Managed Kafka Streaming instance to Data Cloud.

Complete these prerequisites.

  • Review the IP addresses to make sure the Amazon MSK connection has the necessary access.
  1. In Data Cloud, on the Data Streams tab, click New.

    You can also use App Launcher to find and select Data Streams.

  2. Under Other Sources, select the Amazon Managed Streaming for Kafka connection source, and click Next.

  3. Select a connection and topic.

  4. Select the object that you want to import, and click Next.

    You can select a new object on the Available Objects tab or an existing object on the In Use Objects tab.

  5. Under Object Details, select a Category for the type of data in the data stream as Profile, Engagement, or Other.

    If you select Engagement, you must specify a value for the Event Time Field, which describes when the engagement occurred.

  6. For Primary Key, select a unique field to identify a record.

    If a primary key isn’t listed in the dropdown, you must create one using a formula field.

    1. To create a formula field, click New Formula Field.

    2. For Field Label, enter the data stream field’s display name.

    3. For Field API Name, enter the data stream field’s programmatic reference.

    4. For Formula Return Type, select Text.

    5. In the Transformation Formula text box, enter a UUID() formula.

    6. To validate the formula, click Text.

    7. Click Save.

    8. For Primary Key, select the UUID that you created.

  7. (Optional) Select a record modified field.

    If data is received out of order, the record modified field provides a reference point to determine whether to update the record. The record with the most up-to-date timestamp is loaded.

  8. (Optional) For Organization Unit Identifier, select a business unit to use in a record’s data lineage. For more information, see Organization Unit Identifier.

  9. Click Next.

  10. For Data Space, if the default data space isn’t selected, assign the data stream to the appropriate data space.

  11. Click Deploy.

When the Last Run Status is successful, you can see how many records were processed and the total number of records that were loaded.

You can now map your data lake object to the semantic data model to use the data in segments, calculated insights, and other use cases.

When working with the Amazon Kafka connector, keep these behaviors in mind.

  • The relationship between a topic and a record type must be 1-1. That is, a topic can be associated with only one record type, and a record type can be associated with only one topic.
  • Only one data stream can be created per topic. A topic is effectively limited to containing records of a single record type.
  • Only one data stream can be created for a particular record type. For example, if you wish to publish records of type Order to topic NorthAmerica and topic Europe, first create a derived record type called OrderNorthAmerica and OrderEurope. Then publish records of type OrderNorthAmerica to the NorthAmerica topic and records of type OrderEurope to the Europe topic.
  • Each message must not be larger than 200 KB.