Synthesizer

A property that can be used in a say verb to override the application default TTS settings.

Parameters

vendor
stringRequired

Name of the TTS vendor, or ‘default’ to use the application default.

voice
stringRequired

Name of the voice to use for the TTS, or ‘default’ to use the application default.

engine
string

(Google specific) may be standard, neural, generative, or long-form.

gender
string

(Google specific) may be MALE, FEMALE, NEUTRAL.

label
string

Label associated with the TTS vendor.

language
string

Language code for the TTS, or ‘default’ to use the application default.

options
object

Vendor-specific options for the TTS, see below for supported properties.

Vendor-specific options

voice_mode
string

embedding or id (see Cartesia docs)

embedding
string

a voice embedding (see Cartesia docs)

emotion
string

specifies emotion (see Cartesia docs)

speed
number

A number or named specifier (e.g “slow”) (see Cartesia docs)

stability
number

Defines the stability for voice settings (see Elevenlabs docs)

similarity_boost
number

Defines the similarity boost for voice settings. (see Elevenlabs docs)

use_speaker_boost
boolean

Defines the use speaker boost for voice settings. This parameter is available on V2+ models (see Elevenlabs docs)

style
number

Defines the style for voice settings. This parameter is available on V2+ models. (see Elevenlabs docs)

model_id
string

Identifier of the model that will be used (see Elevenlabs docs)

voice_engine
string

The voice engine used to synthesize the voice. (see Playht docs)

quality
string

draft, low, medium, high, premium (see Playht docs)

seed
number

An integer number greater than or equal to 0. If equal to null or not provided, a random seed will be used. Useful to control the reproducibility of the generated audio. Assuming all other properties didn’t change, a fixed seed should always generate the exact same audio file (see Playht docs)

temperature
number

A floating point number between 0, inclusive, and 2, inclusive. If equal to null or not provided, the model’s default temperature will be used. The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice. (see Playht docs)

emotion
string

An emotion to be applied to the speech. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine. (see Playht docs)

voice_guidance
number

A number between 1 and 6. Use lower numbers to reduce how unique your chosen voice will be compared to other voices. Higher numbers will maximize its individuality. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine. (see Playht docs)

style_guidance
number

A number between 1 and 30. Use lower numbers to to reduce how strong your chosen emotion will be. Higher numbers will create a very emotional performance. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine. (see Playht docs)

text_guidance
number

A number between 1 and 2. This number influences how closely the generated speech adheres to the input text. Use lower values to create more fluid speech, but with a higher chance of deviating from the input text. Higher numbers will make the generated speech more accurate to the input text, ensuring that the words spoken align closely with the provided text. Only supported when voice_engine is set to Play3.0-mini or PlayHT2.0, and voice uses that engine. (see Playht docs)

pauseBetweenBrackets
boolean

When set to true, adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: “Hi. <200> I’d love to have a conversation with you.” adds a 200ms pause between the first and second sentences. see Rimelabs docs

phonemizeBetweenBrackets
boolean

When set to true, you can specify the phonemes for a word enclosed in curly brackets. see Rimelabs docs

inlineSpeedAlpha
string

Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: “This sentence is [really] [fast]” with inlineSpeedAlpha “0.5, 3” will make “really” slow and “fast” fast. see Rimelabs docs

speedAlpha
number

Adjusts the speed of speech. Lower than 1.0 is faster than default. Higher than 1.0 is slower than default. see Rimelabs docs

reduceLatency
boolean

Reduces the latency of response, at the cost of some possible mispronunciation of digits and abbreviations. see Rimelabs docs

engine_version
string

The engine version to use. (see Verbio docs)

model_id
string

TTS model to use. (see Whisper docs)