Object Detection Images and Labeling

In the Object Detection Quick Start, the .zip file with the images and the annotations file is provided for you. To create your own model, you first need to gather and label the training data. Here are some best practices when gathering your own data and labeling your images.

The first step to implementing Einstein Object Detection is deciding which objects you want to identify. After you decide that, it's time to gather training data (images) to create the dataset. Use images that are representative of the images that the model will receive in production.

Training Image Considerations

  • Objects in the images are visible and recognizable.
  • Images are forward-facing and not at an angle.
  • Images are neither too dark nor too bright.
  • Images contain 100-200 or more occurrences (across all images) for each object you want the model to identify. The more occurrences of an object you have, the better the model performs.

After you collect training images, you label objects in those images and specify a bounding box around each object. There are a few different options for image labeling.


Use Crowdflower's human-in-the-loop platform to create high-quality training datasets of annotated images. Their platform lets you select and manage the human labelers you need (including your own employees) to meet your quality and cost requirements. Email salesforce_einstein@crowdflower.com to discuss your labeling project.


Use the SharinPix managed package available on the AppExchange to label your images. Their labeling tool offers team management functionality for self-labeling using your own team or assisted labeling with help from SharinPix labelers. Email Jean-Michel Mougeolle at jmmougeolle@sharinpix.com to discuss your labeling project.


You can do-it-yourself and self-label your images, as long as the annotations meet the required format.

No matter which method you use to label your images, the labeling content is stored in a comma-separated (csv) file named annotations.csv. The annotations file contains the image file name and the labels and coordinates (in JSON format) for each object in the image. See the Annotations.csv File Format section of Create a Dataset From a Zip File Asynchronously. Here are the first four lines from the annotations.csv file contained in alpine.zip.

After you create the labels in the annotations.csv file, you package up that file along with the images in a .zip file. The API call to create an object detection dataset uses this .zip file to upload the images and labels. See the Object Detection Datasets section of Create a Dataset From a Zip File Asynchronously.