Learn MOAR in Summer ’20 with New Einstein Vision & Language

Discover Summer ’20 Release features! We are sharing five release highlights for Developers and Admins, curated and published by our evangelists as part of Learn MOAR. Complete the Learn MOAR trailmix by July 31, 2020 to get a special community badge, and unlock a $10 contribution to Libraries Without Borders (Bibliothèques Sans Frontières).

Einstein Vision & Language allows you to quickly build AI-powered apps by making the power of image recognition and natural language processing accessible to anyone.

The Summer ’20 Release adds new capabilities to Einstein Vision & Language that include the following:

  • General availability of Einstein Optical Character Recognition (OCR).
  • Einstein Vision’s newly optimized algorithm that efficiently identifies retail products on shelves.
  • Multi-language support for Einstein Intent.
  • Intent API enhancements that allow for the creation of Einstein Intent Models that support out-of-domain text.

Prerequisites – Einstein Vision & Language APIs

To work with the Einstein Vision & Language REST APIs , you will first need a private key. You can get a private key by signing up here. To understand what you need to call the API, refer to the documentation.

You can reduce your implementation time by installing the Einstein Vision and Language Model Builder on AppExchange. The application is natively built on the Salesforce Customer 360 platform.

Once installed, you can quickly:

  1. Create and train your datasets using clicks
  2. Create image and text predictions with minimal code
  3. Provide global and invocable Apex utilities to simplify application development

Check out the documentation here.

If you are building applications and solutions outside of a Salesforce org, check out our Einstein Platform Developer Center for useful resources.

Einstein Vision Summer ’20 Release Updates

Einstein Vision enables you to tap into the power of AI and train deep learning models to recognize and classify images at scale. You can use pre-trained classifiers or train your custom classifiers to solve unique use cases by leveraging Einstein Image Classification and Einstein Object Detection.

Einstein Optical Character Recognition (OCR) is generally available

In the Summer ‘20 Release, we are making Einstein Optical Character Recognition (OCR) APIs generally available for developers. OCR leverages computer vision to analyze documents and extract relevant information, making repetitive tasks like data entry more efficient.

To learn more about capabilities and available APIs, check them out here.

Some examples where an Optical Character Recognition model can be helpful include:

  • License verification using a driver’s license photo ID
  • Scanning serial numbers on products
  • Scanning readings on medical devices
  • Reading business card data for lead/contact capture
  • Reading data from price sheets or schedule templates

Note that for all the above use cases, OCR reduces the need for manual data entry.

License verification using Einstein OCR

Data entry in situations such as drive-through testing centers or location entry check-in can be time-consuming and require direct physical contact with individuals and their identification documents. A “location check-in app” can be built using OCR capabilities. The Einstein OCR app enables the user to extract textual data from a picture of an ID. This provides benefits including:

  • Reducing time required for data entry
  • Increasing accuracy
  • Providing a data trail
  • Eliminating physical contact with documents

With Einstein OCR, a user can take a photo of the ID using the Salesforce Mobile app. Optical Character Recognition then extracts the text. The scanned information can be used to quickly find existing visitor information. The digitized data is then used to create a new record of the individual in the system if the person is a new visitor.

Scanning product serial numbers

With Einstein OCR, cases can be automatically classified and assigned to specific teams or service agents depending on the serial number of a product, which can be extracted from a picture of the product uploaded by the customer. This helps support teams save time, work efficiently, and improve CSAT (Customer Satisfaction) scores.

Scanning readings on medical devices

Healthcare professionals can capture a photo of a patient monitor or chart using the Salesforce Mobile app and have Einstein OCR detect the text and automatically update the patient record in Salesforce. This helps nurses and doctors save time and work with more efficiency.

Implementing Einstein OCR

Below is a sample image of Codey’s driver’s license photo ID that highlights various data attributes that can be extracted using Einstein OCR. We will demonstrate how to use Einstein OCR to extract details.

Codey’s driver’s license photo ID. Data attributes can be extracted using OCR

A sample curl command to invoke the Einstein OCR API is below. Note that it is assumed that you have a valid access token for your curl request. The command below extracts contact and license details from the driver’s license shown in the image above.

curl -X POST -H "Authorization: Bearer <TOKEN>" \
     -F sampleLocation="https://res.cloudinary.com/cloudyworlds/image/upload/w_1000,ar_16:9,c_fill,g_auto,e_sharpen/v1591060674/Screen_Shot_2020-06-01_at_9.16.12_PM_kablw1.png" \
     -F task="contact" \
     -F modelId="OCRModel" https://api.einstein.ai/v2/vision/ocr

In the Einstein Vision and Language managed package app, Einstein OCR APIs are surfaced through global Apex methods provided by the Einstein Vision and Language Builder managed app. Leveraging the app minimizes the amount of custom code and effort required to build apps using OCR.

You can achieve the extraction of data from the image shown above in a few lines of code, as shown below. Note that the image is uploaded and accessed via a public URL. The API also allows you to send the image data as base64 encoded string in case you do not want to expose the image via a public URL.

Map<String, String> licenseCardInfo = new Map<String, String> (); // Map to hold information from License card
String LICENSE_CARD_IMAGE_URL = 'https://res.cloudinary.com/cloudyworlds/image/upload/w_1000,ar_16:9,c_fill,g_auto,e_sharpen/v1591060674/Screen_Shot_2020-06-01_at_9.16.12_PM_kablw1.png'; // The public image URL
String MODELID = 'OCRModel'; // This is out of box model for OCR
String TASK = 'CONTACT'; // This can be Contact, Table or Text.
String SAMPLEID = ''; // String that you can pass in to tag the prediction. Optional. Can be any value, and is returned in the response.

try {

    einsteinplay.Einstein_PredictionService einsteinService = new einsteinplay.Einstein_PredictionService(
        einsteinplay.Einstein_PredictionService.Types.OCR
    );
    // call and obtain response from the Einstein Prediction Service
    einsteinplay.Einstein_PredictionResult response = einsteinService.predictOcrUrl(MODELID,
        LICENSE_CARD_IMAGE_URL,
        TASK,
        SAMPLEID
    );

    for (einsteinplay.Einstein_Probability probability: response.probabilities) {
        // Populate a Map to hold known Person Details from License Card
        // depending on text structure may require some more text extraction logic
        if (probability.attributes.tag != 'OTHER') {
            licenseCardInfo.put(probability.attributes.tag, probability.label);
        } else {
            // Extract few things based on well known entities
            if (probability.label.contains('DOB')) {
                licenseCardInfo.put('DOB', probability.label.remove('DOB').trim());
            }
            if (probability.label.contains('HGT')) {
                licenseCardInfo.put('Height', probability.label.remove('HGT').trim());
            }
            if (probability.label.contains('WGT')) {
                licenseCardInfo.put('Weight', probability.label.remove('WGT').trim());
            }
            // Use XY coordinates for the location of the character string within the image (also called a bounding box).
            // Based on XY approximate locations one can extract text information of entities that are not well known
            System.debug(probability.label);
            System.debug(probability.boundingBox);  
        }
    }

    // Extract information from the License card
    system.debug('Name of the person on License ' + licenseCardInfo.get('PERSON'));
    system.debug('Phone number on License ' + licenseCardInfo.get('PHONE'));
    system.debug('Address on the License Card ' + licenseCardInfo.get('ADDRESS'));
    system.debug('Date of Birth' + licenseCardInfo.get('DOB'));
    system.debug('Weight of the Person ' + licenseCardInfo.get('Height'));
    system.debug('Height of the Person ' + licenseCardInfo.get('Weight'));
} catch (exception e) {
    // handle error
}

For the above code to work, it is assumed that you have configured the Einstein Vision and Language Model Builder app as per these instructions in the user guide.

Below are some of the key highlights from the above code.

  • Notice that we are using the global Apex class named Einstein_PredictionService. This provides utility methods to invoke Einstein API Services.
  • The complete documentation about the global methods can be found here in Global Apex Methods and the Invocable Methods section.
  • The package updates provide new methods like predictOcrUrl and predictOcrBase64 to allow for Einstein OCR API calls.

Head over to our documentation to learn more about Einstein OCR.

Detect products on shelves with an optimized algorithm

Use the retail execution algorithm to create a model that’s optimized to detect items displayed on retail shelves. This algorithm creates a model that has the same functionality as a model created with the standard image detection algorithm. However this model’s detection accuracy is typically better for retail use case scenarios.

To learn more about this, refer to the documentation here.

Einstein Language updates

Einstein Language APIs allow developers to build natural language processing (NLP) into their apps. Developers can create NLP models to classify the intent of text or to classify the sentiment of text as either positive, negative, or neutral.

Some use cases might include:

  • Case routing to the correct agents by predicting text intent using case description.
  • Automated matching of resumes with job titles by predicting intent from the resume text.
  • Any potential use case for Einstein Bots, as they incorporate the Einstein Intent API.

In Summer ’20, we have added support for multi-language for the Intent API and support for out-of-domain text for Einstein Intent Models.

Multi-language support for Einstein Intent

With Summer ’20, Intent APIs are now GA in English (US), English (UK), French, German, Italian, Portuguese, and Spanish. Chinese Traditional, Chinese Simplified and Japanese are in beta.

Using the Einstein Vision and Language Builder managed package, you can build models in supported languages using simple clicks.

Below is a screenshot of the User Interface from the app that shows how users can now create and train models in multiple languages using clicks.

If you are building your own application leveraging the Einstein Intent API, enable multiple languages by following the steps below.

  1. When creating datasets via the API, pass in the language parameter. Note that the data has to be in the same language as indicated in the language parameter.

An example curl command for uploading datasets in French is shown below. The command assumes that you have a valid access token. This use case is to classify case description text in French with a label and then route the case to the correct agent based on the intent label predicted for the new case.

curl -X POST -H "Authorization: Bearer <TOKEN>" \
     -H "Cache-Control: no-cache" \
     -H "Content-Type: multipart/form-data" \
     -F "name=Dataset de Routage de Cas" \
     -F "path=http://einstein.ai/text/case_routing_intent_fr.csv" \
     -F "language=fr" \
     -F "type=text-intent"  https://api.einstein.ai/v2/language/datasets/upload
  1. When training the model, use the algorithm as multilingual-intent. An example curl command for training the model for the dataset in French is shown below. Note that the command below assumes that you have uploaded the dataset and have the DatasetId. You will need a valid access token as well.
curl -X POST -H "Authorization: Bearer <TOKEN>" \
     -H "Cache-Control: no-cache" \
     -F "name=Modèle de Routage de Cas" \
     -F "algorithm=multilingual-intent" \
     -F "datasetId=<DATASET_ID>" https://api.einstein.ai/v2/language/train

Check out the documentation to learn more about Einstein Intent APIs.

Einstein Intent models now support out-of-domain text

Out-of-domain text is text that doesn’t fall into any of the labels in a model; you can think of it as the “other” category. With Summer ’20, developers can create models capable of handling out-of-domain text by default.

When you train an intent dataset, pass the algorithm parameter with a value of “multilingual-intent-ood” if you think your prediction can return out-of-domain text. We recommend using this by default if you are unsure.

The image below is a screenshot of the user interface from the Einstein Vision and Language Model Builder app. The latest app updates allow users to select a “multilingual-intent-ood” algorithm when training a model.

If you are building your own application leveraging the Einstein Intent API, the below curl command shows how one can train the model using the “multilingual-intent-ood” algorithm. Note that the command below has an assumption that you have uploaded the dataset and have the DatasetId.

curl -X POST -H "Authorization: Bearer <TOKEN>" \
    -H "Cache-Control: no-cache" \
    -H "Content-Type: multipart/form-data" \
    -F "name=Case Routing Model" \
    -F "algorithm=multilingual-intent-ood" \
    -F "datasetId=<DATASET_ID>" https://api.einstein.ai/v2/language/train

In the above scenario, if the intent of the text doesn’t match any of the existing labels, the prediction response returns an empty array.

Check out the Salesforce Summer ’20 Release Notes for Einstein Vision and Language and documentation to learn more.

Summer ’20 brings with it a lot of exciting updates- now it’s your turn to make your apps smarter and more capable. To get started with Einstein Vision and Language services, sign up here.

If you are new to Salesforce Einstein, check out the below Trailhead modules and projects.

 

About the Authors

Surabhi Ravishankar is a Developer and Technical Architect passionate about Machine Learning. She has worked on the Salesforce Platform for over three years and has contributed towards building and maintaining the Einstein Vision and Language Model Builder. Surabhi works with Salesforce customers to understand where cutting-edge and future technologies fit into their strategic roadmap, and assist them by presenting futuristic Salesforce solutions. You can find Surabhi on LinkedIn.

Dennis Schultz is a Master Technical Architect working for Salesforce as a platform specialist for over four years. Through the Emerging Technologies team, his passion is to collaborate with Salesforce customers on innovative applications for advanced technologies such as AI/ML services like Vision, Language, and Voice. You can find Dennis on LinkedIn.

Chris De Gour is a Master Technical Architect working for Salesforce over the last ten years. In addition to building on the Customer 360 platform, his primary job is working with customers around drafting and solving complicated requirements and technical challenges. He has put out several quick examples and draft tools through various open-source initiatives outside the company. You can find Chris on LinkedIn.

Mohith Shrivastava works as a Lead Developer Evangelist at Salesforce. He is currently focusing on Einstein Vision and Language Services, Salesforce CLI, Platform Services, Communities, and Lightning Web Components. You can follow him on Twitter here.