The ability to efficiently access data wherever it resides is crucial when building visual data models, performing analytical operations, or building machine learning models. The Data Cloud Python Connector abstracts Data Cloud’s Query APIs to help developers quickly authenticate and access data within Data Cloud.

In this blog post, we’ll delve into the key features of the Python Connector for Data Cloud v1.0.15, and provide practical examples and code snippets to help you get started.

Prerequisites

Set up your Salesforce environment

For your Python code to authenticate with Data Cloud, you’ll need a connected app and a valid user in Salesforce. For our example, we’ll be using an OAuth 2.0 JWT Bearer flow. This is best suited to server-to-server communications since it doesn’t require someone to log in interactively.

Step 1: Create a certificate and private key

For our Python application to authenticate to Salesforce, we need to create a certificate. Certificates provide a secure way to authenticate applications to Salesforce. The private key ensures that only authorized applications can generate valid JWTs.

For detailed instructions, check out the Salesforce DX Developer Guide. The output of the steps in the developer guide will yield a server.crt and the server.key that we’ll use later in this post, so keep them on hand.

Step 2: Create a connected app in Salesforce

The connected app provides a framework that enables an external application (in this case, our Python application) to integrate with Salesforce and Data Cloud using APIs and standard protocols, such as OAuth and OpenID Connect.

  • Log in to your Salesforce org and navigate to Setup → App Manager. Click New Connected App.
  • Select Create an External Client App, then Continue.

Creating an external client app in App Manager

  • Under the Basic Information section, enter the following:
External Client App Name Data Cloud Python App
API Name Data_Cloud_Python_App
Contact Email <your email address>
Distribution State Local
Description Connected application for Python
  • Under the API section, check the Enable OAuth checkbox.
  • Enter the value https://localhost.com for the Callback URL.
  • Select the following OAuth Scopes:
    • Manage user data via APIs (api)
    • Perform requests at any time (refresh_token, offline_access)
    • Manage Data Cloud profile data (cdp_profile_api)
    • Perform ANSI SQL queries on Data Cloud data (cdp_query_api)
  • Under the Flow Enablement section, select the Enable JWT Bearer Flow checkbox.
  • Use the Upload Files button to upload the server.crt self-signed certificate we created earlier.

Enabling JWT Bearer Flow and uploading your certificate

  • Under the Security section, de-select all options.
  • Click Create.
  • On the Policies sub-tab, click Edit.

Editing the external client app policies

  • Expand the OAuth Policies section.
  • Under the Plugin Policies section, modify Permitted Users to Admin approved users are pre-authorized.
  • Under Select Profiles, select System Administrator. Here you can add any profiles or permission sets for the user you’ll be using in your Python app.

Adding profiles and permission sets as pre-authorized users

  • Under the App Authorization section, modify the Refresh Token Policy to Refresh token is valid until revoked.
  • For IP Relaxation, select Relax IP restrictions.

Editing the App Authorization section for External Client Apps

  • Click Save.

Step 2: Retrieve the Consumer Key and Secret

Now that the Connected App is created, we can retrieve the consumer key.

  • On the Settings sub tab under OAuth SettingsApp Settings, click Consumer Key and Secret.

Retrieving your consumer key and secret

On the page displayed, click copy for the Consumer Key and save the details for later.

The consumer details page for your external client app

Set up your Python Environment

Step 1: Install a Python interpreter

Along with the Python extension, you need to install a Python interpreter. Which interpreter you use is dependent on your specific needs but some guidance is provided in the Visual Studio documentation.

Step 2: Start VS Code in a workspace folder

  • Create a folder to store your project called data-cloud-demo through the operating system UI, then open VS Code and use VS Code’s File > Open Folder to open the project folder.

Step 3: Create a virtual environment

A best practice among Python developers is to use a project-specific virtual environment. Once you activate that environment, any packages you then install are isolated from other environments.

  • Open the Command Palette (⇧⌘P), start typing the Python: Create Environment command to search, and then select the command.

Creating a virtual environment using the Command Palette in VS Code

  • The command presents a list of environment types, Venv or Conda. For this example, select Venv, then select your interpreter.

Create your Python source code

Step 1: Add the Salesforce private key to your project folder

  • From the File Explorer toolbar, select the New File button on the data-cloud-demo folder.
  • Name the file salesforce.key and copy and paste the private key from server.key created earlier. Your private key can be used to access your Salesforce environment and you must never share it. Immediately put it on .gitignore (or equivalent) and use a secret manager to securely store sensitive data to adhere to your company’s security policies for production use.

Adding your private key to your Python project

Step 2: Install the Salesforce Data Cloud Connector and PyYAML

  • Install the CDP Python Connector from the PyPI (Python Package Index) repository using the following command.

Upon successful installation, you’ll see the following message: Successfully Installed salesforce-cdp-connector-<version>.

  • Then install a YAML parser that we can use to read configuration files.

Upon successful installation, you’ll see the following message: Successfully Installed pyyaml-<version>.

Step 3: Create a config file

We’ll store the parameters needed in a config file as a best practice to avoid hard-coding them later.

  • From the File Explorer toolbar, select the New File button on the data-cloud-demo folder.
  • Name the file config.yaml and add your Salesforce details.

Adding a configuration file to store environment variables

Step 4: Create a Python file

  • From the File Explorer toolbar, select the New File button on the data-cloud-demo folder.
  • Name the file data-cloud.py, and VS Code will automatically open it in the editor.

Creating a Python file in VS Code

Step 5: Create a Connection object

The Connection object handles the authentication to Data Cloud. It provides support for username and password flow, OAuth Web Server Flow, and OAuth JWT Bearer Flow. In this post, we’re using the JWT flow using the connected app that we created earlier.

login_url Salesforce org url
client_id The consumer key copied from your connected app
username The username of the person to authenticate as
private_key The private key used when creating the connected app

The Connection object will automatically create a JWT token and use the private key to encode the payload. It will also automatically exchange the Salesforce access token it receives for a Data Cloud token that can be used to invoke its APIs. For details on the prerequisites required to access Data Cloud resources, check out the Data Cloud Reference Guide.

  • In the data-cloud.py file add the following code:

Your code should look like this:

Python code showing how to create a connection to Data Cloud in VS Code

Step 6: Retrieve data

The Python Connector for Data Cloud has three ways to fetch data: fetchone(), fetchall(), and get_pandas_dataframe(). You can substitute the queries in the examples with data lake objects, data model objects, or calculated insight objects from your environment.

Let’s take a look at each of these.

fetchone()

  • Create a cursor object to execute queries. When a query is executed, the cursor passes that query to Data Cloud which fetches the results.

This method retrieves the first row of a query.

fetchall()

  • Create a cursor object to execute queries. When a query is executed, the cursor passes on that query to the Data Cloud to fetch the results.

get_pandas_dataframe()

Pandas is a powerful Python library designed specifically for data manipulation and analysis. It provides high-performance, flexible data structures and a wide range of tools for data cleaning, transformation, and analysis. It’s widely adopted by data scientists since it integrates well with other libraries like NumPy and Matplotlib, making it easier to perform statistical analysis, data visualization, and machine learning tasks.

A DataFrame is a fundamental data structure in Pandas, and it is essentially a two-dimensional structure with columns that can hold different data types.

The get_pandas_dataframe() method allows developers to retrieve results from Data Cloud into this structure directly.

Let’s update the code to execute a SQL query for a data model object called Animal__dlm and use the ability to immediately put the results into a Panda DataFrame.

  • In the data-cloud.py file add the following code:

Here we’re using the Pandas DataFrame head() method. This returns a specified number of rows from the top of the DataFrame. The head() method returns the first five rows if a number is not specified. Note: The column names will also be returned in addition to the specified rows.

Run your Python code

Step 1: Run Python file

  • To run your Python project, click the play button in the top-right of your VS Code editor. The button opens a terminal window and runs data-cloud.py.

Complete Python code connecting to Data Cloud and fetching data into a Pandas DataFrame

  • Alternatively, you can run your code using the following command:

python3 data-cloud.py (macOS/Linux) or python data-cloud.py (Windows):

Step 2: Verify output

After you run your Python file, you can see the output from the query.

Final Python output showing successful retrieval of data

With only a few lines of code, we have successfully connected to Data Cloud and queried key records for use in our application.

Conclusion

The Python Connector for Data Cloud is a powerful tool that simplifies the process of interacting with Data Cloud APIs from Python applications. The connector simplifies authentication with Data Cloud and provides simple methods to retrieve data.

With the ability to easily fetch key data from your data model objects, data lake objects, and calculated insights, you can create visual data models, perform powerful analytical operations, and build powerful machine learning models.

Resources

About the author

Dave Norris is a Developer Advocate at Salesforce. He’s passionate about making technical subjects broadly accessible to a diverse audience. Dave has been with Salesforce for over a decade, has over 35 Salesforce and MuleSoft certifications, and became a Salesforce Certified Technical Architect in 2013.

Get the latest Salesforce Developer blog posts and podcast episodes via Slack or RSS.

Add to Slack Subscribe to RSS