Many consider speech recognition to be the next ‘killer app’ for mobile. Apple famously (or infamously, depending on how much you’ve used it) added Siri in iOS 6 and Android responded with Google Now. So how do you add speech recognition to an enterprise mobile application built on the Salesforce Touch Platform? I thought you’d never ask!
We recently hosted a webinar introducing the AT&T Toolkit for Salesforce Platform and I demoed a simple mobile application built using Visualforce and the Salesforce Mobile SDK that uses the AT&T toolkit to search for Case records in Salesforce based on a user’s voice input. You can jump to this point in the webinar recording to watch a demo of the application. The full code base for that application is also available on Github. Let’s dissect and breakdown the application architecture and code.
The figure below describes the high-level architecture for the application.
The app is built using Visualforce and JQuery Mobile and displays all Cases assigned to the currently logged in User. I then used the Salesforce Hybrid Mobile SDK to create a hybrid version of the app to install on an iPhone or Android device. When the user clicks the voice search button, the app starts capturing the microphone input from the device. The recording binary is then sent over to the Apex controller for the page. In the controller we use the AT&T Toolkit to invoke the AT&T Speech-to-Text API. AT&T translates the voice input into text and returns the results back to the controller. Lastly, we perform a SOQL search based on the translated text and return any matching Case records to the mobile app where they are displayed to the user.
What is the AT&T Toolkit for Salesforce?
AT&T has an extensive library of public APIs that developers can use to build enterprise apps and solutions. Developers can now access those APIs natively from the Force.com platform with the AT&T Toolkit for Salesforce Platform. The toolkit provides strongly-typed Apex wrappers for RESTful AT&T APIs like speech-to-text, SMS, location, payment and more.
Will this Speech-to-text App only work for AT&T subscribers?
The short answer – no. Here’s the longer version. As mentioned earlier, the application uses the AT&T Toolkit to perform the speech-to-text conversion. However, the AT&T Speech-to-text API is carrier agnostic. An app does NOT have to run on an AT&T device in order to invoke the API. In that sense, the AT&T Speech-to-text API is no different from say the Nuance API and can be invoked from any mobile device, no matter the underlying OS (Android, iOS etc.) or carrier.
Developing the app
Lets now review the key components of the app and the step-by-step process of creating it.
Installing the AT&T toolkit
The first step is to install and configure the AT&T Toolkit in your DE or Sandbox Org. Since the toolkit is available as an unmanaged package, this step should take no more than a few minutes. You next have to create a free AT&T Developer account and configure a couple of things on the AT&T and Salesforce sides.
Building the Visualforce app
Building a Hybrid mobile app
This article walks you through the steps for creating an iOS hybrid app from a Visualforce page using the Mobile SDK. During the webinar I demoed a hybrid iOS version of the application built that way. However one of the advantages of hybrid mobile development is that you can also create an Android version of the same Visualforce page. You can refer to this blog post for how to create an Android hybrid application using the Mobile SDK.
Recording audio using PhoneGap/Cordova
Now that we’ve included the Cordova JS library, lets see how the application captures the microphone input. The snippet below from the CaseDemo.page shows how we use the Cordova Media API to record the user’s voice input.
A simple call to the startRecord function of the Cordova Media object (line 17) starts recording the voice input. Once the user is done speaking, they press the ‘Stop Recording’ button on the page and the following JS function is invoked.
After invoking the stopRecord() function, the recording is saved as a binary file on the mobile device (WAV format in the case of iOS and AMR format in the case of Android). We then convert the binary recording into a Base64 encoded string on line 6. At this point the user can play back the recording to confirm and review it. Here is the JS function that gets invoked when the user invokes the ‘Play back’ button on the page.
Speech-to-text using the AT&T toolkit
Finally, lets review what happens when the user invokes the ‘Search’ button to perform a search for matching Case records in Salesforce.
Line 9 shows the first use of the AT&T toolkit. As mentioned earlier, the toolkit provides wrapper Apex classes for invoking AT&T API’s like Speech-to-text, SMS and more. Developers don’t have to worry about the underlying plumbing of creating and parsing JSON messages, invoking the AT&T RESTful APIs, handling authentication etc. – the toolkit abstracts all that away. The AttSpeech class for example is the wrapper class for invoking the AT&T Speech-to-Text API. Lines 12-14 set the various inputs required to invoke the API, not least of which is the binary recording received from the mobile device. We then invoke the convert() method of the AttSpeech class (line 16) to invoke the API and the translated text is returned as a AttSpeechResult object. Finally, we perform a simple SOQL query to find any Case records whose parent Account name matches the translated text and return the result to the Visualforce page for display to the user.
Hopefully this blog post has gotten your creative juices flowing about what’s possible when building mobile apps on the Salesforce Touch Platform and then enhancing them with AT&T mobility services like Speech-to-Text. Happy coding.