Synthesizer

Parameters

vendor

stringRequired

Name of the TTS vendor, or ‘default’ to use the application default.

voice

stringRequired

Name of the voice to use for the TTS, or ‘default’ to use the application default.

engine

string

(Google specific) may be standard, neural, generative, or long-form.

gender

string

(Google specific) may be MALE, FEMALE, NEUTRAL.

label

string

Label associated with the TTS vendor.

language

string

Language code for the TTS, or ‘default’ to use the application default.

options

object

Vendor-specific options for the TTS, see below for supported properties.

Vendor-specific options

cartesia

voice_mode

string

embedding or id (see Cartesia docs)

embedding

string

a voice embedding (see Cartesia docs)

emotion

string

specifies emotion (see Cartesia docs)

speed

number

A number or named specifier (e.g “slow”) (see Cartesia docs)

elevenlabs

stability

number

Defines the stability for voice settings (see Elevenlabs docs)

similarity_boost

number

Defines the similarity boost for voice settings. (see Elevenlabs docs)

use_speaker_boost

boolean

Defines the use speaker boost for voice settings. This parameter is available on V2+ models (see Elevenlabs docs)

style

number

Defines the style for voice settings. This parameter is available on V2+ models. (see Elevenlabs docs)

model_id

string

Identifier of the model that will be used (see Elevenlabs docs)

playht

voice_engine

string

The voice engine used to synthesize the voice. (see Playht docs)

quality

string

draft, low, medium, high, premium (see Playht docs)

seed

number

An integer number greater than or equal to 0. If equal to null or not provided, a random seed will be used. Useful to control the reproducibility of the generated audio. Assuming all other properties didn’t change, a fixed seed should always generate the exact same audio file (see Playht docs)

temperature

number

A floating point number between 0, inclusive, and 2, inclusive. If equal to null or not provided, the model’s default temperature will be used. The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice. (see Playht docs)

emotion

string

An emotion to be applied to the speech. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine. (see Playht docs)

voice_guidance

number

A number between 1 and 6. Use lower numbers to reduce how unique your chosen voice will be compared to other voices. Higher numbers will maximize its individuality. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine. (see Playht docs)

style_guidance

number

A number between 1 and 30. Use lower numbers to to reduce how strong your chosen emotion will be. Higher numbers will create a very emotional performance. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine. (see Playht docs)

text_guidance

number

A number between 1 and 2. This number influences how closely the generated speech adheres to the input text. Use lower values to create more fluid speech, but with a higher chance of deviating from the input text. Higher numbers will make the generated speech more accurate to the input text, ensuring that the words spoken align closely with the provided text. Only supported when voice_engine is set to Play3.0-mini or PlayHT2.0, and voice uses that engine. (see Playht docs)

rimelabs

pauseBetweenBrackets

boolean

When set to true, adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: “Hi. <200> I’d love to have a conversation with you.” adds a 200ms pause between the first and second sentences. see Rimelabs docs

phonemizeBetweenBrackets

boolean

When set to true, you can specify the phonemes for a word enclosed in curly brackets. see Rimelabs docs

inlineSpeedAlpha

string

Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: “This sentence is [really] [fast]” with inlineSpeedAlpha “0.5, 3” will make “really” slow and “fast” fast. see Rimelabs docs

speedAlpha

number

Adjusts the speed of speech. Lower than 1.0 is faster than default. Higher than 1.0 is slower than default. see Rimelabs docs

reduceLatency

boolean

Reduces the latency of response, at the cost of some possible mispronunciation of digits and abbreviations. see Rimelabs docs

verbio

engine_version

string

The engine version to use. (see Verbio docs)

whisper

model_id

string

TTS model to use. (see Whisper docs)