Machine Learning and Random Forest Classification

There has been a lot of hype about generative artificial intelligence (generative AI) and large language models (LLMs). However, there are other types of AI, like machine learning (ML), that are well-suited to powering up your data by using your historical data, stored in Salesforce, Commerce Cloud, Marketing Cloud, or another system, to make predictions.

As you may already be aware, we are currently working on a low/no-code Model Builder in Einstein Copilot Studio, which will allow you to choose the best AI model for your data and business use case. It’s important to understand the types of AI models that exist, so that you can choose the best AI models for your business problem. And this means learning more than just what’s available in generative AI and LLMs.

In this blog post, we’ll give you an overview of machine learning, two popular types of ML, and random forest classification, a popular ML model used by data scientists.

What is machine learning?

Machine learning is a type of artificial intelligence that allows you to make predictions using historical data. Predictive insights from your ML models can help your business to make better decisions and offer better recommendations to your customers. For example, you can use customer data stored in Salesforce, Commerce Cloud, or Data Cloud based on their purchase history to recommend other products that may interest them. You can then use an ML model on that data and make predictions in Einstein Copilot Studio in Data Cloud. In the future, you will be able to use Einstein Copilot Studio’s Model Builder to make predictions using limited or no code at all.

You have likely already encountered machine learning algorithms in your everyday life. For example, if you use Netflix, you might notice that there are sometimes categories that say: “Because you watched this, we think you may be interested in this.” Or, after watching a YouTube video, you receive recommendations of other videos and channels that YouTube thinks you may like based on your past watch history.

Types of machine learning

There are two very popular types of machine learning: supervised and unsupervised. Supervised machine learning means having a full set of labeled data while training an algorithm.

A diagram of how supervised learning works in machine learning

Fully labeled means that each example in the training dataset is tagged with the answer that the algorithm should come up with on its own. For example, a labeled dataset of pictures of colors red, green, and blue, would tell the model which photos had the correct color. When shown a new image, the model compares it to the training examples to predict the correct label (in this example, the correct color).

In unsupervised learning, an ML model is handed a dataset without explicit instructions on what to do with it. The training dataset instead contains a collection of examples without a specific desired outcome or correct answer. The ML model then attempts to automatically find structure in the data by extracting useful features and analyzing its structure. This is also referred to as clustering.

Random forest classification

Random forest classification is a popular ML model, that uses multiple decision trees to reach a single outcome. Random forest has been widely adopted for solving machine learning problems because of its versatility in handling both classification and regression problems. Random forest classifiers are also effective tools for estimating missing values in data as they maintain accuracy.

Imagine that you have a complex problem to solve or a complex question that you need an answer to. You decide to gather a group of experts from different fields to share their opinions on the answer to your question. Each expert provides their individual opinion on the answer based on their expertise and experience. In the end, the experts take a vote to arrive at a final decision.

In random forest classification, multiple decision trees are created using different random subsets of data and features. Each decision tree acts like an expert, providing its own opinion on how to classify the data. Predictions are then made by calculating the individual prediction for each decision tree and taking the most popular result. Please note: random forest regression problem predictions use an averaging technique instead.

Random forest classification is used across a wide variety of industries, such as:

Finance: It is a preferred algorithm over others because it reduces time spent on data management and pre-processing of tasks. It can be used to evaluate customers with high credit risk and to detect fraud.
Healthcare: Random forest algorithms have been used by doctors allowing them to make estimates around drug responses to specific medications.
E-commerce: It can be used for recommendation engines for cross-sell purposes.

Random forest classification is used in the example scenario in this AWS blog. In the example scenario, demographic data such as gender and state of residence, and behavior data, such as campaign participation and pages visited, are used in a random forest classification ML model to determine products that should be recommended to a customer. Below is an excerpt of the code that was used in the product recommendation ML model in the previously mentioned blog, showing a practical example of how to use random forest classification from sci-kit-learn in your code.

1X = df.drop(["product_purchased__c"], axis=1)
2y = df["product_purchased__c"]
3X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
4
5model = RandomForestClassifier()
6model.fit(X_train, y_train)
7y_pred = model.predict(X_test)
8accuracy = accuracy_score(y_test, y_pred)
9if print_option:
10    print(accuracy)
11
12with open('model.joblib', 'wb') as f:
13    joblib.dump(model,f)

Closing words

In this blog, you learned about machine learning and some types of ML models. You also learned about random forest classification, a very popular ML model used by data scientists across a variety of industries that you can even apply to Salesforce industry clouds, such as Financial Services Cloud, Health Cloud, or Commerce Cloud, to make predictions for a variety of different problems. You now have a better understanding of what machine learning is and how you can use it with the power of the Einstein 1 Platform. Now it’s time to take your learning even further and start working with some ML models on your own and integrating them with Einstein Copilot Studio!

Resources

Trailhead: Artificial Intelligence Fundamentals
Trailhead: Data Fundamentals for AI
Trailhead: Machine Learning Predictions: Quick Look

About the author

Danielle Larregui is a Senior Developer Advocate at Salesforce focusing on the Data Cloud platform. She enjoys learning about cloud technologies, speaking at and attending tech conferences, and engaging with technical communities. You can follow her on X(Twitter).