Amazon SageMaker is a fully managed machine learning (ML) service where data scientists and developers can build and train ML models. Developers can then directly deploy their models into a production-ready hosted environment.

In this blog post, we’ll explain how to connect Salesforce Data Cloud to Amazon SageMaker, so that you can start using your Data Cloud data in your own machine learning models. Prior to following the steps below, please ensure that you have already configured SageMaker by setting up a SageMaker domain, domain user, and SageMaker execution role.

Important Note: You should not launch Amazon SageMaker Studio before configuring your secrets and life cycle configurations, and attaching the life cycle configurations to the domain (which we explain how to do later in this post). If you accidentally launch SageMaker Studio, you will need to delete the applications in Amazon SageMaker and relaunch for the configurations to take effect.

Connecting Salesforce Data Cloud and Amazon SageMaker

Let’s dive in on how to connect Salesforce Data Cloud to Amazon SageMaker.

Step 1: Create a connected app and generate a consumer key and secret

Connected apps are how we enable external applications like Amazon SageMaker to securely authenticate and authorize with Salesforce Data Cloud. In order to connect Amazon SageMaker to Salesforce Data Cloud, you’ll first need to create a connected app in Salesforce and enable OAuth settings.

For the callback URL, enter https://<domain-id>.studio.<region>.sagemaker.aws/jupyter/default/lab, and provide the domain ID that you captured while creating the SageMaker domain and the region of your SageMaker domain.

Under selected OAuth scopes, choose the following:

    • Manage user data via APIs (api)
    • Perform requests at any time (refresh_token, offline_access)
    • Perform ANSI SQL queries on Salesforce Data Cloud data (Data Cloud_query_api)
    • Manage Salesforce Customer Data Platform profile data (Data Cloud_profile_api)
    • Access the identity URL service (ID, profile, email, address, phone)
    • Access unique user identifiers (openid)

Note: Make sure that the connected apps that you create for this integration do not have “Require Proof Key for Code Exchange (PKCE) Extension for Supported Authorization Flows” selected. We do not yet support PKCE yet when connecting to Amazon SageMaker.

Finally, under Consumer details, you will need to generate the consumer key and secret.

Step 2: Store the consumer key and secret in AWS Secrets Manager

In AWS, navigate to Secrets Manager using global search, then click Store a new secret.

Screenshot of Secrets Manager with “Store a new secret” selected

Next, choose Other type of secret.

Screenshot showing “Choose another type of secret” selected

Then, enter in the following JSON shown below. Replace client_id with the consumer key that you generated, and replace client_secret with your consumer secret. Your issue_url will be your Salesforce domain. Please note: you must use an org-specific domain, and you cannot use login.salesforce.com as your domain. You will also need to add a tag to this policy of sagemaker:partner“.

You will also need to create a policy for your SageMaker execution role in the AWS IAM console to grant access to your secret. In this example, we are granting our Amazon execution role access to all secrets in the Secrets Manager. You should follow the principle of least privilege access and copy the ARN of your secret and update the JSON to grant access to specific resources.

You’ll also need to add these additional policies to specific roles.

On the IAM console, attach the following policies to their respective roles (these roles will be used by the SageMaker project for deployment):

    • Add the policy AmazonSageMakerPartnerServiceCatalogProductsCloudFormationServiceRolePolicy to the service role AmazonSageMakerServiceCatalogProductsCloudformationRole.
    • Add the policy AmazonSageMakerPartnerServiceCatalogProductsApiGatewayServiceRolePolicy to the service role AmazonSageMakerServiceCatalogProductsApiGatewayRole.
    • Add the policy AmazonSageMakerPartnerServiceCatalogProductsLambdaServiceRolePolicy to the service role AmazonSageMakerServiceCatalogProductsLambdaRole.

Step 3: Create a lifecycle configuration flow

SageMaker Studio lifecycle configuration provides shell scripts that run when a jupyter notebook is created or started. The lifecycle configuration will be used to retrieve the secret and import it to the SageMaker runtime.

  1. On the SageMaker console, choose Lifecycle configurations in the navigation pane.
  2. Choose Create configuration.
  3. Leave the default selection Jupyter Server App and choose Next.
  4. Give the configuration a name.
  5. Enter the following script in the editor, providing the ARN for the secret you created earlier.

Your lifecycle configuration should look like this:

Screenshot of the lifecycle configuration script

Step 4: Attach the lifecycle configuration to your SageMaker domain and set it as default

  1. Navigate to Amazon SageMaker and choose Domains on the left console.
  2. On the Environment tab, choose Attach to attach your lifecycle configuration.
  3. Choose the lifecycle configuration you created and choose Attach to domain.
  4. Set it as the default.

Screenshot of the Lifecycle configurations page

Connect Data Wrangler

Finally, we are ready to connect Salesforce Data Cloud to Data Wrangler, so that we can import our data from data lake objects and data model objects. In SageMaker Studio, on the File menu, choose New and then Data Wrangler flow. Choose Import data.

Screenshot of the Import data option in Data Wrangler

Please note: Activating this will immediately create kernels inside of Amazon SageMaker Studio that will make you incur charges. To see how to shut down resources in Amazon SageMaker Studio so that they are not kept running, reference the AWS documentation.

Screenshot of kernels running in SageMaker Studio

Next, click Create connection to create a new connection.

Screenshot of Import data page in Data Wrangler.

Then, choose the Salesforce Data Cloud tile.

Screenshot of data sources that are available in Data Wrangler

Enter a name for your connection, input the domain of your Salesforce org in the Salesforce org URL field, and click through the screens to create your connection.

Screenshot of Salesforce Data Cloud’s Create connection page

You’ll receive a message that your Salesforce Data Cloud was set up successfully. Additionally, you’ll see your Salesforce Data Cloud org that was connected listed under Connections in Data Wrangler.

Screenshot of Connected orgs on the Connections landing page.

Conclusion

You now know how to connect your Salesforce Data Cloud org to Amazon SageMaker, so that you can start using your Data Cloud data in Amazon SageMaker models. In the next blog post in this series, we’ll cover how to import your data from Data Cloud into Amazon SageMaker and prepare it for use in an ML model.

Additional resources

About the author

Danielle Larregui is a Senior Developer Advocate at Salesforce focusing on the Data Cloud platform. She enjoys learning about cloud technologies, speaking at and attending tech conferences, and engaging with technical communities. You can follow her on X(Twitter).

Get the latest Salesforce Developer blog posts and podcast episodes via Slack or RSS.

Add to Slack Subscribe to RSS