How to Automate Data Extraction from Salesforce Using Python | Salesforce Developers Blog

Analysts frequently come across the need to automate the data extraction process from Salesforce objects in order to conduct data analysis or build dashboards locally. Prior to automation, the analysts in my organization had to take the following steps to create KPI reports:

  • Log in through the Salesforce site
  • Manually update the fields for multiple reports
  • Download each report as a CSV file
  • Perform calculations, manipulate the data, and create an aggregated dataset
  • Finally, paste it in a Google Sheet to be consumed for reporting purposes

This process was manual, time-consuming, and error-prone. However, it doesn’t have to be this way. You can automate the entire process with the help of a simple Python script and Salesforce API.

Retrieving Salesforce data using Python

To get started, install an open source package called Simple Salesforce. This is a basic Salesforce REST API client that allows users to download the pre-built report directly from Salesforce or generate a new dataset by querying the Salesforce API using SOQL (Salesforce Object Query Language).

Set up your Python package and API access

1. The first step is to install the Python package, which allows you to import all of the required packages into your Python file.

pip install simple_salesforce
from simple_salesforce import Salesforce
import requests
import pandas as pd
from io import StringIO

2. The Salesforce function allows you to connect to the API (you will need API access and your Salesforce credentials). If you need help connecting:

a. Reach out to your Salesforce admin for API access.

b. If you don’t have a security token, log into your Salesforce account and go to the settings page. Then click on Reset My Security Token under My Personal Information, which will send you the details via email. If you don’t see this option, contact your Salesforce admin.

Note: Be sure to protect your passwords and keys; you can set environment variables for these credentials within your script.

sf = Salesforce(username='',password='', security_token='')

Access your Salesforce data

There are two different options for getting data from Salesforce into a DataFrame in Python: one via downloading a Salesforce report and the other through querying the data using SOQL.

Option 1: download a Salesforce report
Download the pre-built custom Salesforce report by using the Python Requests get() method and providing your Salesforce Instance URL and the ID of the report to be downloaded. Additionally, you can read the .csv file as a DataFrame for further data manipulation.

sf_instance = 'https://oneappexchange.lightning.force.com/' #Your Salesforce Instance URL
reportId = '' # add report id
export = '?isdtp=p1&export=1&enc=UTF-8&xf=csv'
sfUrl = sf_instance + reportId + export
response = requests.get(sfUrl, headers=sf.headers, cookies={'sid': sf.session_id})
download_report = response.content.decode('utf-8')
df1 = pd.read_csv(StringIO(download_report))

Option 2: query the data using SOQL
Query the Salesforce data directly by using SOQL, which is similar to Structured Query Language(SQL) but applies only to Salesforce data. For additional details about SOQL and its limitations, refer to the SOQL documentation.

Finding Metadata

  • Find the object and field API names by navigating to the following: Org Instance URL -> Setup -> Object Manager -> Field & Relationships.


  • Run the following command to check all the field names of the Salesforce object using Python. The Salesforce object API name is: UserInstall__c.
descri=sf.UserInstall__c.describe()
[field['name'] for field in descri['fields']]

Fetching the data with SOQL

Simple Salesforce provides the method query_all to fetch data using SOQL. You can query more than one object at a time using relationship queries. This is helpful in cases when you need to merge data from different Salesforce objects into a single dataset. Listing is another object, and you can pull the field from this object using the lookup field by specifying the dot notation Listing__r.ProviderName__c.

results=sf.query_all("""
    Select 
    CreatedDate,
    Listing__r.RecordTypeSnapshot__c,
    Name,
    Listing__r.ProviderName__c
    from UserInstall__c
    where CreatedDate=LAST_N_DAYS:7 
    """)

Constructing a DataFrame from a dictionary

The results of the SOQL query are in an ordered dictionary format. You can iterate through the nested data and generate a DataFrame (as shown below).

records = [dict(CreatedDate=rec['CreatedDate'], 
                Record_Type=rec['Listing__r']['RecordTypeSnapshot__c'],
                Name=rec['Name'],
                ProviderName=rec['Listing__r']['ProviderName__c']) 
                for rec in results['records']]
df=pd.DataFrame(records)

Automating data processing and reporting

With your data as a DataFrame in Python, you can perform both data wrangling and aggregate metrics together as desired. You can also choose how to output the data. For example, you may want to export it to Google Sheets using the Google Sheet API and leveraging the pygsheets Python package or ingest it into a database table. Furthermore, you can schedule the Python script on a platform like Airflow to automatically run at a scheduled time and build a dashboard based on this aggregated dataset.

Conclusion

Hopefully, you find this post helpful! This simple script automates the data extraction and reporting process, saving our team hours of manual work and reducing the risk of typing errors. To learn more, refer to our sample code on GitHub.

About the author

Rutuja Patil is an Analytics Manager on the Platform Analytics team at Salesforce, where she focuses on business analytics, automation, and reporting using BI Tools. Rutuja has an MS in Information Systems and is certified in Tableau.

Stay up to date with the latest news from the Salesforce Developers Blog

Subscribe