Speech to Text Action

Converts speech in an audio file to text using AI-powered speech recognition.

This action is available in API version 66.0 and later.

Supported REST HTTP Methods

URI: /services/data/v66.0/actions/standard/speechToText

Formats: JSON, XML

HTTP Methods: POST

Authentication: Authorization: Bearer token

Inputs

Input Type Description
contentDocumentId string Required. The ID of the audio file stored in Salesforce Files. This value is the contentDocumentId from the ContentDocument object that represents the file to be transcribed.
transcriptionModel string Optional. The transcription model used to convert speech to text. Valid values are whisper-v3-turbo and elevenlabs-scribe-v2. If you don't specify a value, whisper-v3-turbo is used.

Outputs

Output Type Description
convertedText The transcript of the audio file, returned as plain text in the detected language.

Usage

Sample Input

This sample uses elevenlabs-scribe-v2. You can also set transcriptionModel to whisper-v3-turbo, which is the default if the value isn't specified.

1{
2    "inputs": [
3        {
4            "contentDocumentId": "069xx000004WhFoAAK",
5            "transcriptionModel": "elevenlabs-scribe-v2"
6        }
7    ]
8}

Sample Output

1{
2    "outputs": [
3         {
4             "convertedText": "Thank you for contacting support. How can I help you today?"
5         }
6    ]
7}