Image Orientation and Einstein Vision
Image orientation (landscape or portrait) can affect model accuracy. This topic contains orientation best practices for image training data and images sent for prediction.
When you take a photo with a mobile phone, the photo is stored in landscape orientation, whether you take the photo holding the phone vertically or horizontally. The phone senses whether you’re holding the phone vertically or horizontally and stores that information in the EXIF metadata for the image. EXIF metadata stores information about the photo such as compression type, where the photo was taken (geolocation), and image orientation.
Since all images are stored in landscape, the image orientation data specifies whether an image needs to be rotated and how much. Applications read the EXIF data to render the image to the user as they expect it to look.
Let’s look at an example. Here’s an image of store shelves. The image was taken in portrait mode, holding the phone vertically. When you see the image file in Windows Explorer, it reads the EXIF data and displays the image in portrait. The EXIF orientation data for this image is: Orientation Rotate 90 CW.
If you navigate to the image on a mobile phone, the image displays in landscape because that's how the image is stored on the phone. Images taken in landscape mode always appear in landscape and the image orientation EXIF data is: Orientation Horizontal (normal).
If your Einstein Vision model doesn’t have the accuracy you want, it could be due to either the orientation of the images that the model is trained on or the orientation of images sent in for prediction. See this blog post for more information about computer vision and image orientation.
In Einstein Vision, you first upload your image data in a dataset. The dataset type specifies whether the model created from the dataset performs image classification or object detection.
When you create a dataset, the upload process doesn’t use the image EXIF data, so a portrait image is processed as a landscape image (rotated 90 degrees) by Einstein Vision.
Image classification models are fairly robust to differences in orientation, but they always perform best when the training data reflects the prediction data. Expect a modest decrease in detection accuracy if the training and prediction data are different orientations (landscape vs. portrait).
-
If a model is trained on only landscape images of cats, it will do a fair job at detecting cats in portrait images and a better job at detecting cats in landscape images.
-
If a model is trained with many examples of both landscape and portrait images of cats, it will do a good job at detecting cats in both landscape and portrait images.
Object detection datasets pose additional complexity because you create annotation information for objects within an image. Both the images and the annotation information are added to the dataset. Be sure that you rotate images to the correct orientation before you annotate them. This way, the coordinates of the bounding boxes match the objects in the image.
For example, let’s say you have an object detection model that detects bottles on a shelf. When you annotate the source images, be sure that the image orientation is such that the bottles are vertical. One way to ensure that you are annotating your images in the correct orientation is to make sure that the software you use for visualizing images ignores EXIF data.
When you send an image in for prediction, keep in mind that Einstein Vision doesn’t process the image EXIF data. If the image is taken as landscape, the image is processed by Einstein Vision as landscape. If the model is an object detection model, the predicted bounding boxes correspond to this orientation (bounding boxes are oriented to the top-left corner of the image).
If the image sent for prediction is taken as portrait, the image is processed as rotated 90 degrees clockwise (landscape). If the model is an object detection model, the predicted bounding boxes are in reference to that rotated orientation (bounding boxes are oriented to the top-left corner of the landscape image).
One way to ensure that you’re processing your predictions correctly is to make sure that the software you use for processing and visualizing prediction results ignores EXIF data.
Another way to ensure more accurate predictions is to rotate any portrait images (which you can find out from the EXIF data) so that they’re in portrait mode before sending them in for prediction. You can modify the image orientation programmatically, depending on what language your client application uses.