Determine the Model Type You Need

You can use Einstein Vision to create different types of models depending on what you want the model to identify in images.

The type of model you need depends on the images you’re sending into the model and what you want the model to identify in those images. The model type is defined by the dataset type and the data in the dataset from which the model is created.

Einstein Vision supports these model types:

  • Classification—predicts the single class into which an image falls.
  • Multi-label classification—predicts multiple classes into which an image falls.
  • Object detection—identifies objects within an image.

The purpose of an image classification model is to predict the class into which an image most likely falls. In the Create a Custom Classifier Scenario, you create a beaches and mountains dataset and then train that dataset to create a model. The goal of that model is to identify if an image falls into one of those classes: was the photo taken at the beach or was it taken in the mountains.

The prediction response from this model type returns the top five classes sorted by probability in descending order. The probabilities in that response add up to 1. So if you send an image of a beach into a beaches and mountains model, the results look like this JSON.

Use the standard classification model when you want the model to return a response that tells you that an image is a specific thing. For example, in the case of cars, you might have a model with different car brands. When an image is sent to the model, the response tells you the likelihood that the image is a specific car brand.

To create a classification model, you first specify image in the type request parameter of the API call to create a dataset. Then ensure that the image data that you add to that dataset supports the kind of predictions you expect to see from the model. When you train the dataset, the resulting model has a modelType of image.

The purpose of a multi-label model is to predict multiple classes into which an image most likely falls. This type of model predicts probabilities for multiple classes based on what’s in an image. The prediction response returns all the labels in the model sorted by probability in descending order.

In a multi-label model, the prediction response returns labels and probabilities; but those probabilities don’t add up to 1. For example, if you had a multi-label model for sports equipment, and an image that contains a baseball glove and bat is sent in, the response looks like this JSON.

Use a multi-label model when your scenario requires an image to be classified in multiple classes. For example, let’s say you’re a developer that works for a company that makes clothing accessories like shoes, hats, scarves, and backpacks. Your job is to implement functionality whereby someone can upload a photo and find out what products are in the photo.

An image can contain multiple products, so the model must return a prediction that identifies each product in the image. In this case, you create a multi-label model that can identify multiple products in a single image.

The process for building a multi-label model is the same as an image classification model: you gather the data, create the dataset, and then train the dataset to create the model. The difference is the response that comes back when an image is classified against a multi-label model.

To create a multi-label model, you first specify image-multi-label in the type request parameter of the API call to create a dataset. Then ensure that the image data that you add to that dataset supports the kind of predictions you expect to see from the model. When you train the dataset, the resulting model has a modelType of image-multi-label.

The Einstein Object Detection API (Beta) identifies objects within an image. For each object identified, the API returns the coordinates for a bounding box around the object in the image, a class label, and a probability that the object in the bounding box matches the class label.

Some scenarios for using the Object Detection API include locating product logos in images or counting products on shelves. An object detection model is different from a multi-label model. A multi-label model returns the probability that particular objects are in an image. In contrast, an object detection model identifies the location of specific objects within an image.

GroceryShelfObjectDetection

Let's say that Alpine is using the Object Detection API to detect products on store shelves. After you send an image in for prediction, you receive a response that looks like this JSON. The labels that you see vary depending on the labels in your model.