How to Build a Predictive AI Model in Data Cloud

Salesforce Data Cloud puts AI into the hands of developers with easy-to-use tools. It allows you to create your own predictive AI models, bring in predictions from models in hyperscalers, and connect to generative AI large language models (LLMs) from OpenAI and Azure OpenAI using Einstein Studio.

This blog post describes the steps needed to create a predictive AI model using clicks, not code to help you understand the basics of how AI can help your company make proactive decisions. We’ll build the model and look at the insights provided.

The use case

In this post, we’ll use a dataset that includes a list of animals at a fictitious rescue center. The data provides attributes of each animal that has been housed at the rescue center, including a boolean value (Yes/No) indicating whether or not the animal has been adopted.

We want to be able to predict the likelihood of adoption for new animals arriving at the rescue center. This will help identify the top predictors that could help rehome animals faster.

This is where predictive AI can shine. By using the existing dataset with known values for adoption, we can create a predictive model using Einstein Studio that can provide us with a value indicating the likelihood of adoption. We can then explore the key variables of the dataset that drive this value.

The sample data for our animal rescue center looks like this:

Sample list of animals ingested into Data Cloud

In an earlier blog post, we described how this data can be imported from your enterprise sources. Our previous post used Microsoft Azure Blob Storage, but given the breadth of Data Cloud connectors available, the data can be ingested from anywhere.

We want to maximize the chance of the Adopted column having the value Yes.

This is where Einstein Studio helps. We’ll create a predictive model from scratch using clicks, not code.

Building the model

Open Einstein Studio in Data Cloud and click Add Predictive Model.

Building a new predictive model in Einstein Studio

Then select Create a model from scratch and click Next.

Note: We are using a clicks, not code approach in this post, but note that you can bring your own model too. If you want complete flexibility in how the model is trained and refined, you can use Amazon SageMaker, Google Vertex AI, or Databricks.

Creating a predictive model with clicks

Select the data source for the model. In our case, we’ll select the Animal data model object we previously ingested from Microsoft Azure Blob Storage. Then click Next.

Selecting the data source for a predictive model

Here, we can select our training dataset. This can be all the records in the data model object, or alternatively, for large numbers of records you can filter the data to train on a subset. For our use case, we’ll select All Records, and then click Next.

Filtering the data source for a predictive model

Next, we can set a goal. Select the Adopted field to use for our predict values. Then, we want to select Maximize for an Adopted value of Yes, which is when an animal was successfully adopted. Then click Next.

Setting the goal for a predictive model

Now, we can select the attributes of the data model object to use for training our model. We only want to select attributes that contain information relevant to the prediction task. We will remove Data Source and Data Source Object since these values contain irrelevant features and it helps the model focus on the important ones.

Selecting the variables for a predictive model

Einstein now allows us to select the algorithm to use for our predictive model. Different algorithms help solve for different problems. Typically, data scientists make a choice based on the problem type (classification, regression), data characteristics (size, structure) and desired model properties (accuracy, interpretability, efficiency). Einstein can select the best algorithm to use based on our data. We’ll let Einstein pick for us, which in this case is XGBoost, and click Next.

Selecting the algorithm for a predictive model

The summary screen recaps the selections we made. Click Save.

Reviewing the summary for a predictive model

We’ll name our model Predicted Animal Adoption and click Save and Train.

Selecting the name for a predictive model

The model is now being trained on the data, and this can take some time depending on the dataset used.

Evaluating the model

When the training is finished, Einstein provides metrics that we can view to determine the effectiveness of our model. To access the training metrics, navigate to your model and click View Training Metrics. In our example, we’re using a binary classification algorithm.

Viewing training metrics for a predictive model

AUC score

The Area Under the Curve (AUC) score is a metric used to evaluate a model’s performance in distinguishing between positive and negative classes. It considers all possible classification thresholds and provides a single numerical score between 0 and 1. A perfect model that flawlessly separates the positive and negative classes would have an AUC of 1. Conversely, a random guessing model would have an AUC of 0.5.

Our model performs well with a score of 0.853. It correctly predicts that an animal will be adopted 84% of the time and is correct 70% of the time when an animal isn’t adopted based on the training data.

Evaluating the model accuracy

ROC curve

The Receiver Operating Characteristic (ROC) curve is a graphical tool used to evaluate the performance of binary classification models.

X-axis (False Positive Rate – FPR): This represents the proportion of negative cases that the model incorrectly classified as positive. A higher FPR indicates that the model is mistakenly identifying too many negative cases as positive.
Y-axis (True Positive Rate – TPR): This represents the proportion of positive cases that the model correctly classified as positive. A higher TPR indicates the model is successfully identifying true positive cases.

A ROC curve closer to the top-left corner generally indicates better model performance. A random guessing model would have a diagonal ROC curve, indicating no relation between the model’s predictions and the actual classes.

Evaluating the model performance using the Receiver Operating Characteristic (ROC) curve

Top predictors

This section refers to the features or variables that have the most significant impact on the model’s predictions. These features are the ones that the model relies on most heavily to make accurate classifications.

For animal adoption, we can see that age, sterilization status, breed, and the number of photos have a significant impact on the adoption score.

Evaluating the most important variables when making predictions

Conclusion

You don’t have to be a data scientist to start looking for predictive insights with your data. AI in Data Cloud provides the ability to create a sophisticated model with a few clicks.

In this blog post, we used a sample dataset to predict the likelihood of an animal being adopted. This can be used to determine the top predictors contributing to the score. With this data, we can more confidently drive business decisions.

Sterilization status: This is #2 on the top predictor list and a candidate for further investigation. We can’t assume that adoption rates improve only if an animal is sterilized. We should look at how age impacts this predictor and why this might be the case based on who ultimately adopted the animal.
Number of photos: This is #4 on the top predictor list. Providing guidance to front-line staff to include more photos of each animal would improve the likelihood of adoption and is easy to implement.
Fee: This is surprisingly low on the predictor list at #12. We might consider adding a small fee to each adoption knowing it wouldn’t lessen the chance of adoption and would help cover animal rescue costs.

This is just the start — we could improve the model further with more data. For example, including the types of advertising conducted by the animal center for each animal would help us predict the best types of marketing campaigns to run for each animal.

Data Cloud helps you gain insights into your data that were previously unattainable, which can make a material difference to your business. Einstein Studio prediction jobs or Flow Builder actions can consume predictions from your AI models, helping to make these predictions available to everyone in your business.

Resources

Documentation: Unlock the Power of AI with Einstein Studio
Trailhead: Build AI Models in Einstein Studio

About the author

Dave Norris is a Developer Advocate at Salesforce. He’s passionate about making technical subjects broadly accessible to a diverse audience. Dave has been with Salesforce for over a decade, has over 35 Salesforce and MuleSoft certifications, and became a Salesforce Certified Technical Architect in 2013.