For the past two years, Salesforce Data Cloud and Snowflake have partnered to provide solutions for businesses, so that they can make better decisions through the use of combined data, analytics, and machine learning capabilities of the Snowflake platform. We are happy to announce that we’ve made our zero-ETL data-sharing capability generally available, beginning on September 1, 2023. This new capability allows customers to share data from Salesforce Data Cloud to Snowflake seamlessly — all with just a few clicks with no ETL involvement!
Developers will be able to leverage their data from Salesforce Data Cloud in Snowflake (Snowflake’s machine learning and data science framework) to train and deploy their models in Snowflake. In this blog post, we’ll provide an overview and walkthrough of the new data share object and data share target capabilities built into Salesforce Data Cloud.
What is Snowflake?
Snowflake is a cloud data warehouse platform that can store your data records at scale. It sits on public clouds, similar to Amazon Web Services (AWS), Google Cloud Platform (GCP), and Azure. Snowflake offers native support to many types of data like structured and semi-structured. It also eliminates the need for enterprise customers to have separate data storage for each of their functional units. Customers can move data to Snowflake using any ETL tool supported by the platform. Data analysts and data scientists can run different queries to get the answers to the questions raised by the business.
Data integration by conventional methods
Custom-made solutions sometimes don’t follow proven data protection methods and lack security. Batch processes designed through ETL methods take time and are expensive, causing businesses to not be able to act on the data in a quick and timely manner. Additionally, ETL creates high maintenance costs, and the data needs to be cleansed before it gets to your data warehouse. That said, Salesforce’s vision is to help customers make data-driven decisions much faster and with greater confidence. Therefore, Salesforce and Snowflake have built a secure zero-ETL data-sharing capability to solve these problems.
Introducing the zero-ETL data-sharing capability
Zero-ETL data sharing is a direct integration between Salesforce Data Cloud and Snowflake that does not require users to move or copy the data from Salesforce to Snowflake. This empowers users in Snowflake to query the live data from Salesforce for many analytics use cases. This zero-ETL data-sharing capability will eliminate the data pipelines between Salesforce Data Cloud and Snowflake, along with managing those complex ETL pipelines and scaling the pipelines for optimal performance. The new capability is built into Salesforce Data Cloud and accessible through the Data Cloud UI.
How does the zero-ETL data-sharing capability work?
As part of the zero-ETL data sharing capability, we introduced the concept of “data sharing,” which allows users to assemble a set of data objects from Salesforce Data Cloud and link them with the data sharing target of their choice. Once the data share objects and data share targets are linked, a bridge between Salesforce and Snowflake is established and objects natively appear in Snowflake.
Once data sharing is created in Salesforce Data Cloud and linked to the target, the integration can provide near real-time updates on objects and reflect the latest data in Snowflake. Specifically, if the data is in Salesforce Data Cloud, it can stay there because now Salesforce Data Cloud can serve as the host of a supplemental and rich unified dataset for analysis to be done in Snowflake.
Using Data Cloud’s no-code and low-code tools, let’s look at an example of how to do data sharing, how to create a data share target, and how to link a data share object with a data share target in order to generate BI visualizations, build ML models in Snowflake, and consume the data for downstream processing.
Our example scenario
Let’s say a brand has its own first-party data containing attributes about its customers, and the data resides in Salesforce Data Cloud and its associated sales SKUs. The brand wants to advertise to find new customers with the same attributes and to combine those attributes with other characteristics from Snowflake to drive upsell opportunities. Brands can now securely join first-party data from Salesforce Data Cloud with Snowflake — all without exposing IDs to Snowflake data since the data is not physically moved from Salesforce Data Cloud to Snowflake and it uses only the metadata store.
Creating meaningful and actionable insights in Snowflake relies upon rich data from Salesforce. The integration with Salesforce Data Cloud datasets provides considerable benefits to customers in terms of data availability with minimal latency, no data being copied, and no cost in building the complexity of integrations.
In this simple example, we are going to share three different profile category data model objects’ data “Unified Individual,” “Unified Contact Point Phone,” and “Unified Contact Point Email” from Salesforce Data Cloud to Snowflake. Once these objects are shared with Snowflake, we are going to see how they appear as secure views and the sample data preview. These unified profiles are created in Data Cloud by combining the data from various sources into a single profile based on user-identified identity resolution rules in Data Cloud.
How to create, link, and view data using the new data share capability
Prerequisite
The data is ingested, harmonized, modeled, and prepared in Data Cloud for sharing.
Step 1: Select the objects
We are first going to share three data model objects within the Profile category: Unified Individual, Unified Contact Point Phone, and Unified Contact Point Email.
Step 2: Create a data share target
Next, we will create a data share target, which is a Snowflake account. This is where we want to share our Unified Individuals data for analysts to build the report on.
Step 3: Link the data share object to the data share target
After creating the data share object and data share target, let’s take a quick look at how we can link the two together. This is where the power of Salesforce Data Cloud will be evident. Once the data share object is linked with the Snowflake account, the objects within that category will appear with minimal latency in Snowflake.
Step 4: Viewing objects in Snowflake
Once we have linked the data share object with the target, Salesforce Data Cloud automatically creates secure views in Snowflake for those objects. A database can be created for the data share objects by accepting them in Snowflake. Once this database is created in Snowflake, all the objects would appear as views, and you would be ready to query the data and join the data with other Snowflake tables. As an example, you can see below how the views would appear in Snowflake.
Closing words
With our zero-ETL data sharing capability, it is possible to query live data in Salesforce from Snowflake and ensure that changes in the Salesforce objects will be reflected in Snowflake. There is no need to worry about scheduling any jobs or pulling the metadata and data changes from Snowflake. This will empower developers and data scientists to build machine learning models and AI-powered applications on top of the Snowflake platform by joining Salesforce and Snowflake data.
Resources
About the author
Sriram Sethuraman is a Senior Manager in Salesforce Data Cloud product management. He has been building products for 9+ years using big data technologies. In his current role at Salesforce, Sriram works with major public cloud providers, such as Google, AWS, and Azure, to build stronger data integration solutions.