Text to Speech Action (Beta)

Converts input text to spoken audio and returns the generated Base64-encoded audio output.

This action is available in API version 66.0 and later.

Supported REST HTTP Methods

URI
/services/data/v66.0/actions/standard/textToSpeech
Formats
JSON, XML
HTTP Methods
POST
Authentication
Authorization: Bearer token

Inputs

Input Details
inputText
Type
string
Description
Required. The text to convert to voice.
voiceSpeed
Type
string
Description
Optional. Specifies the speed at which the generated speech is delivered. This parameter increases or decreases the playback speed of the spoken audio output.
voiceStability
Type
string
Description
Optional. Specifies the stability of the generated speech output. This parameter controls the consistency and variation in speech delivery. Higher values produce more uniform speech, while lower values result in greater expressive variation.
voiceId
Type
string
Description
Optional. Specifies the identifier of the voice used to generate spoken audio. This parameter controls the tone and characteristics of the generated speech output. To retrieve available voice IDs, send a GET request to the Text to Speech REST endpoint.
fileOutput
Type
boolean
Description
Optional. Specifies whether the response returns an audio file output instead of Base64-encoded audio. The default is false.

Outputs

OUTput Details
convertedAudio
Description
The generated audio output returned in Base64-encoded format based on the provided input text and voice settings.

Usage

Sample Input

This sample converts text input to Base64-encoded audio using the Text to Speech action.

1{
2  "inputs": [
3    {
4      "inputText": "Hello! How are you?",
5      "voiceSpeed": "1",
6      "voiceStability": "0.5",
7      "voiceId": "Jbte7ht1CqapnZvc4KpK"
8      "fileOutput": true
9    }
10  ]
11}

If fileOutput is set to false or not specified, the response returns Base64-encoded audio output.

Sample Output

The response returns generated spoken audio as Base64-encoded audio data.

1{
2  "outputs": [
3    {
4      "audioFile": "<audio file output>",
5      "contentType": "audio/mpeg"
6    }
7  ]
8}

The response returns generated spoken audio as a file output when fileOutput is set to true.