How to use Einstein Object Detection

Editor’s note: This feature has since been retired in Summer ’23. For more details, read the announcement here.

Today we’re announcing the availability of our newest Einstein Platform Services offering – Einstein Object Detection in beta. This blog post explains how it compares to Einstein Image Classification and how to get started.

Einstein Image Classification vs. Einstein Object Detection

The Einstein Vision family started with the introduction of Einstein Image Classification. This service provides the ability to train a model to classify an image into one or multiple categories. Einstein Object Detection has a different approach; it tells you where one or more objects are in an image which allows you to get specific counts and locations of objects within images.

Let’s see how it works in the Einstein Playground:

To sum it up:

Einstein Image Classification (single-label) tells you which category or class an image belongs in, like “trailhead-characters”.
Einstein Image Classification (multi-label) tells you one or more categories that the image belongs in, like “trailhead-astro” and “trailhead-codey”.
Einstein Object Detection tells you that there are two Astros and one Codey bear and also provides the x/y coordinates of these objects within the image.

You can read more about Einstein Image Classification in the linked blog post at the bottom of this page. For now, let’s focus on Einstein Object Detection.

How to prepare your training data

First, collect example images that contain the type of object you want to detect against and that are representative of the images you will want the model to analyze. Be sure to collect a good sized sample of training images as the more training data you provide, the better your model’s detection will be. Unlike Einstein Image Classification, prepping your test data doesn’t end with gathering a good sample.

To train an Einstein Object Detection model, you must provide the coordinates of the bounding boxes around the objects you want detected. So what does “coordinates of the bounding boxes” mean? Glad you asked.

Draw a box around the object you want detected in the image, this is called a bounding box. For training you need the x/y pixel coordinates of the top left edge of that box, as well as the pixel width and height. You can have multiple objects within a single example image.

Once you’ve collected all of your example images and ~~the~~ object data, you create a file with the name annotations.csv. This file is a key part of the Einstein Object Detection training. It tells the Einstein Platform where in your sample data it will find the objects it needs to pay attention to during training.

This is an example screenshot of how you would set up the file.

In the csv you should include the exact image file names and the corresponding coordinates of the object(s) labeled in each image. Then you place the annotations.csv file beside the example images in a zip file, which you upload to Einstein Platform Services.

Create your dataset and train your model

All Einstein Platform Services follow the same pattern:

Collect and classify training data (images or text)
Upload the data to the respective Einstein Platform Service
Train custom models based on your training data
Predict/classify/detect data based on the model of interest

This is how you upload a dataset and train the Einstein Platform for Einstein Object Detection, using the open-source wrapper on GitHub.

Code explanation:

Line 1: This creates a new Einstein_PredictionService object that handles all requests.
Line 2: The Einstein Platform server downloads the .zip file from the remote URL and stores it. The createDatasetFromUrlSync() method is only recommended for small files. If you upload large files, use createDatasetFromUrlAsync instead.
Line 3: A new model is trained based on the dataset. You use this model to get detections.

Please note that for a real-world implementation you should check the status of the dataset creation with dataset.getStatusMsg()=='SUCCEEDED' before starting the model training.

Once the model training is finished (hint: check it in a similar way as the dataset status) you can start with Einstein Object Detection.

Detecting Objects within an Image

When predicting, or in this case detecting, Einstein Platform Services always returns a list of probabilities. These are a combination of the image label (based on your training data) and a probability score for that label.

With this data you can use Einstein Object Detection for example to filter for the number of specific objects in an image.

Code explanation:

Line 1: This creates a new Einstein_PredictionService object that handles all requests.
Line 2: The service detects based on the given model the objects within in the image.
Line 4: With a built-in convenience method of the wrapper, we filter the returned probabilities. The filter criteria are the label baseball and a probability score greater-equal to 0.9 (which equals 90%). With the resulting array you have the number of that specific object in the image.

The prediction results of Einstein Object Detection also return additional values. Remember the bounding box that was previously mentioned? This is the kind of data that Einstein Object Detection adds to standard prediction results.

This data represents the outer x and y coordinates of the bounding box for the detected object, based on the pixel size of the originating image. Use cases include validating if certain objects are aligned on the same height, for example in a shelf, or determining where objects are located in relation to other objects.

Next steps

Einstein Platform Services offer a set of powerful APIs that help augment your business processes. Check out the full Einstein Platform documentation. On Trailhead we provide several modules and projects like Einstein Vision QuickStart and Muenzpraeger’s Home for Wayward Cats. Earn those badges now! To directly start with Einstein Object Detection you should sign up for an account on Einstein.AI and also install the wrapper library on GitHub.

About the author

René Winkelmeyer works as a Senior Developer Evangelist at Salesforce. He focuses on enterprise integrations, mobile, and security with the Salesforce Platform. You can follow him on Twitter on his handle @muenzpraeger.