Add Image Recognition Intelligence to Your .NET Apps with Einstein Vision

We get it: You’re a .NET developer not a data scientist. But you still want to be able to use machine learning to make your apps more intelligent. This is where Einstein Vision comes in! Einstein Vision is a service that enables developers to quickly implement image recognition in their apps without needing a data science degree. This post is for .NET developers who want to build image recognition intelligence into their apps.

In this post, I provide an overview of Einstein Vision: what it is and how you can use it to build visual smarts into your apps. My goals are to give you an understanding of the Einstein Vision API and a starting point for calling the Einstein Vision API from .NET.

Our scenario is simple: The .NET app displays an image of a tree frog from the web. The user clicks the Predict button, and the app calls the API to send the image to the general image classifier, which is one of the prebuilt models. The response JSON contains a prediction as to what the image is, and looks something like this:

In this scenario, to keep it simple, you use one of the prebuilt classifiers (models). The list of classes in the general image model is here. The overall process looks like this:

Sign up.
Generate a token.
Call the API and send in an image.
Get a prediction back.

In my next post, I show you how to use the API to create a dataset and then train that dataset to create your own model.

Get the code

The sample app is a C# Winforms app. Old school, I know, but it means that you can run it with minimal setup and intervention. To get the code, clone the dotnet-vision-predict repo from the command line.

If you don’t have a GitHub account, you can download the code in a .zip file. From your browser, navigate to https://github.com/dsiebold/dotnet-vision-predict and click Clone or download and select Download ZIP. Then extract the .zip file on your local drive.

Sign up

The first step to use the API is to sign up for an Einstein Platform account. When you sign up, the service creates an account associated with your email address and you download your key as a .pem file. If you already signed up and have your key, you can move on to the Generate a Token section.

Go to the sign-up page to sign up using Salesforce. Be sure you’re logged out of Salesforce before you go to the sign-up page. When you sign up using Salesforce, you log in with your Salesforce credentials. If you don’t have a Salesforce org, you can get a free Developer Edition org here.

The final step of the sign-up process is to download your key. The key is contained in a file named einstein_platform.pem. If you’re using an older browser and Download Key doesn’t work, then cut and paste the key contents into a text file and save it as einstein_platform.pem.

Generate a token

Each time you call the Einstein Vision API, you must pass a valid access token in the header. In production, you write code to generate the token. But for now, we use the Einstein Platform token generation page to generate an access token. This page provides an easy way to get a token for testing or playing around with the API. You need a token to run the code, so do that now.

From your browser, navigate to the token generation page.
Enter your email address. Be sure to enter your email address and not your Salesforce username.
Upload your key by navigating to the location where you saved einstein_platform.pem.
Set the token time to be 60 minutes or greater. Enough time so you can run the code sample without having to go back and refresh the token.
Click Get Token.

Run the app

Now it’s time to see the API in action. Open Visual Studio 2017. This app was created in the free Community Edition. Open the PredictImage.sln solution to load it up.

The sample app loads an image from the web at http://metamind.io/images/generalimage.jpg. When you click Predict, the code calls the Einstein Vision API and passes in the image URL. The response from that call contains the predictions.

The app displays the JSON and also shows the predictions in a graphical format. The API returns the top five predictions sorted by probability in descending order (most likely to least likely).

Before you can call the API, first paste the token you generated into the authToken variable.

After you copy in your own token, you’re ready to run the app. Click Start, and the form opens and loads up the image of the tree frog. Click Predict and voila! You see the response loaded in the text box and those same results displayed in a graph.

Take a code journey

Now that you know what the app does, let’s look at some of the key pieces of code. All the magic happens in the Predict button’s click event.

Sets the variables for the image URL, the access token, the model ID, and the API endpoint.
Creates the web request, adds the authorization header, and specifies that it’s a POST.
Builds the request and specifies the mulitpart/form-data fields and their values.
Makes the call and gets the response.
Parses the response that contains the predictions, and shows the response data in the UI.

Before I started coding, I made sure I was able to successfully make the call in cURL. Here’s what the cURL command looks like:

Create the web request

I used HttpWebRequest because inevitably I run into limitations when I use any of the simpler options like the WebClient class or the HttpClient class. This code creates the request and adds the authorization header with your access token.

Build the request stream

Creating the request stream was the trickiest part because the request must be formatted exactly how the API expects it. The Einstein Vision API uses the multipart/form-data format, so I sent the sampleLocation and modelId parameters in as form fields in the request.

The API gives you three ways to send an image to the /predict resource:

Pass a string that contains the image converted to a base64 string.
Pass the image URL.
Upload a local file.

In this case, the app passes in the image URL string in the sampleLocation field. You don’t need to pass in a file, so that makes the call much simpler.

I didn’t have any existing code for multipart/form-data requests from .NET, so like any good developer, I started with Google. I found an illuminating blog post by Travis Illig. His blog post not only gave me a comprehensive understanding of the multipart/form-data request format but also a handy code snippet to format the fields in the request which I used in this code sample. Thanks, Travis!

This code adds the two form fields to a dictionary and passes the dictionary into the WriteMultipartFormData method.

Call and response

After getting everything set up, you call request.GetResponse to make the API call. The response JSON contains the top five labels that the model predicts for the specified image. Each label represents a class in the model. The results are sorted by the probability field in descending order.

The results are first displayed in a text box so you can see what’s coming back from the API. The solution uses the open source JSON framework Json.NET by Newtonsoft. As you would expect, this framework gives you an object model for working with JSON in .NET. The following code formats the JSON, saves it to an array, and then uses that array to add the data to a graph. With a more comprehensive graph control than the OOTB one, you could do some really cool visualizations of the data.

Next steps

One way to enhance the functionality of this app would be to enable drag and drop so that users can drag an image onto the app and then click Predict. In the code, you would:

Convert the image to a base64 string.
Pass the base64 string to the API in the sampleBase64Content parameter.

The maximum file size you can send into the /predict resource is 1 MB. Something to keep in mind if you create functionality that uses photos taken with a mobile phone.

If this post has achieved its goals, you now have a better understanding of the Einstein Vision API and how you can use it to implement image recognition. The code sample also gives you a launching point so you can start writing your own code.

Other resources

About the author

Dianne Siebold is a principal technical writer on the platform doc team at Salesforce.