Recognizer

A property that can be used in gather, transcribe or other verbs to override the application default recognizer settings.

Parameters

vendor
stringRequired

Speech vendor to use (see list below, along with any others you add via the
custom speech API). Note: this field is case sensitve, all the built-in vendors are lower case eg aws not AWS

altLanguages
array

(Google, Microsoft) An array of alternative languages that the speaker may be using.

asrDtmfTerminationDigit
string

DTMF key that terminates continuous ASR feature.

asrTimeout
number

Timeout value for continuous ASR feature.

azureServiceEndpoint
string

Custom service endpoint to connect to instead of hosted Microsoft regional endpoints.

diarization
boolean

(Google) Enable speaker diarization.

diarizationMaxSpeakers
number

(Google) Set the maximum speaker count.

diarizationMinSpeakers
number

(Google) Set the minimum speaker count.

enhancedModel
boolean

(Google) Use an enhanced model.

filterMethod
string

(AWS) The method to use when filtering speech: remove, mask, or tag.

hints
array

(Google, Microsoft, Deepgram, Nvidia, Soniox) Array of words or phrases to assist speech detection.
See examples below.

hintsBoost
number

(Google, Nvidia) Number indicating the strength to assign to the configured hints.
See examples below.

identifyChannels
boolean

(AWS) Enable channel identification.

initialSpeechTimeoutMs
number

(Microsoft) Initial speech timeout in milliseconds.

interactionType
string

(Google) Set the interaction type: discussion, presentation, phone_call, voicemail,
professionally_produced, voice_search, voice_command, dictation.

interim
boolean

If true, interim transcriptions are sent.
Default: false.
Note: this only effects use in a Transcribe verb, in Gather interims are sent based on the presence of a partialResponseHook

language
string

Language code to use for speech detection.
Defaults to the application-level setting.

languageModelName
string

(AWS) The name of the custom language model when processing speech.

minConfidence
number

If provided, final transcripts with confidence lower than this value
return a reason of 'stt-low-confidence' in the webhook.

model
string

(Google) Speech recognition model to use.
Default: phone_call.

naicsCode
number

(Google) Set an industry NAICS code that is relevant to the speech.

outputFormat
string

(Microsoft) simple or detailed.
Default: simple.

profanityFilter
boolean

(Google, Deepgram, Nuance, Nvidia) If true, filter profanity from speech transcription.
Default: false.

profanityOption
string

(Microsoft) masked, removed, or raw.
Default: raw.

punctuation
boolean

(Google) Enable automatic punctuation.

requestSnr
boolean

(Microsoft) Request signal-to-noise ratio information.

separateRecognitionPerChannel
boolean

If true, recognize both caller and called party speech using separate recognition sessions.

singleUtterance
boolean

(Google) If true, return only a single utterance/transcript.
Default: true for gather.

transcriptionHook
stringRequired

Webhook to receive an HTTP POST when an interim or final transcription is received.

vad.enable
boolean

If true, delay connecting to the cloud recognizer until speech is detected.

vad.mode
number

If vad is enabled, this setting governs the sensitivity of the voice activity detector;
value must be between 0 and 3 inclusive.
Lower numbers mean more sensitivity.

vad.voiceMs
number

If vad is enabled, the number of milliseconds of speech required before connecting to the cloud recognizer.

vocabularyFilterName
string

(AWS) The name of a vocabulary filter to use when processing the speech.

vocabularyName
string

(AWS) The name of a vocabulary to use when processing the speech.

Vendor-specific options

speechSegmentationSilenceTimeoutMs
number

Duration (in milliseconds) of non-speech audio within a phrase that’s currently being spoken before that phrase is considered “done.”
See here for details.

alternatives
number

Number of alternative transcripts to return.

apiKey
string

Deepgram API key to authenticate with (overrides setting in Jambonz portal).

customModel
string

ID of custom model.

diarize
boolean

Whether to assign a speaker to each word in the transcript.

diarizeVersion
string

If set to '2021-07-14.0', the legacy diarization feature will be used.

endpointing
number | string

Indicates the number of milliseconds of silence Deepgram
will use to determine a speaker has finished saying a word or phrase.
Value must be either a number of milliseconds or 'false' to disable the feature entirely.
Default: 10ms.

keywords
array

An array of keywords
to which the model should pay particular attention to boosting or suppressing to help it understand context.

model
string

Deepgram model used to process submitted audio.
Example models: 'nova-3', 'nova-2', 'nova-2-phonecall'; see Deepgram docs for full list.
Default: 'general'.

multichannel
boolean

Indicates whether to transcribe each audio channel independently.

nodelay
boolean

Indicates whether to enable Deepgram’s nodelay feature.

numerals
boolean

Indicates whether to convert numbers
from written format (e.g., "one") to numerical format (e.g., "1").

profanityFilter
boolean
punctuate
boolean

Indicates whether to add punctuation
and capitalization to the transcript.

redact
array

Whether to redact information
from transcripts.
Allowed values: 'pci', 'numbers', 'true', 'ssn'.

replace
array

An array of terms or phrases
to search for in the submitted audio and replace.

search
array

An array of terms or phrases to search for in the submitted audio.

shortUtterance
boolean

Causes a transcript to be returned as soon as Deepgram’s is_final property is set.
This should only be used in scenarios where you expect a very short confirmation
or directed command and want minimal latency.

smartFormatting
boolean

Indicates whether to enableDeepgram’s Smart Formatting feature.

tag
string

A tag to associate with the request.
Tags appear in usage reports.

tier
string

Deepgram tier you would like to use.
Allowed values: 'enhanced', 'base'.
Default: 'base'.

utteranceEndMs
number

A number of milliseconds of silence that Deepgram will wait
after the last word was spoken before returning an UtteranceEnd event,
which is used by Jambonz to trigger the transcript webhook if this property is supplied.
This is essentially Deepgram’s version of continuous ASR.

version
string

Deepgram version of the model to use.
Default: 'latest'.

acousticCustomizationId
string

ID of a custom acoustic model.

baseModelVersion
string

Base model to be used.

instanceId
string

IBM speech instance ID (overrides setting in Jambonz portal).

languageCustomizationId
string

ID of a custom language model.

model
string

The model to use for speech recognition.

sttApiKey
string

IBM API key to authenticate with (overrides setting in Jambonz portal).

sttRegion
string

IBM region (overrides setting in Jambonz portal).

watsonLearningOptOut
boolean

Set to true to prevent IBM from using your API request data to improve their service.

watsonMetadata
string

A tag value
to apply to the request data provided.

allowZeroBaseLmWeight
boolean

When true, custom resources (DLMs, wordsets, etc.) can use the entire weight range.

clientData
object

An object containing arbitrary key-value pairs to inject into the call log.

clientId
string

Nuance client ID to authenticate with (overrides setting in Jambonz portal).

discardSpeakerAdaptation
boolean

If speaker profiles are used, whether to discard updated speaker data.
By default, data is stored.

filterWakeupWord
boolean

Whether to remove the wakeup word from the final result.

formatting.options
object

Object containing key-value pairs of formatting options and values defined in the data pack.

formatting.scheme
string

Keyword for a formatting type defined in the data pack.

includeTokenization
boolean

Whether to include a tokenized recognition result.

kryptonEndpoint
string

Endpoint of the on-prem Krypton endpoint to connect to.
Default: Hosted service.

maskLoadFailures
boolean

Whether to terminate recognition when failing to load external resources.

maxHypotheses
number

Maximum number of n-best hypotheses to return.

noInputTimeoutMs
number

Maximum silence (in milliseconds) allowed while waiting for user input after recognition timers are started.

punctuation
boolean

Whether to enable auto-punctuation.

recognitionTimeoutMs
number

Maximum duration (in milliseconds) of the recognition turn.

resource
array

An array of zero or more recognition resources
(domain LMs, wordsets, etc.) to improve recognition.

resource[].builtin
string

Name of a built-in resource in the data pack.

resource[].externalReference
object

An external DLM or settings file
for creating or updating a speaker profile.

resource[].externalReference.headers
object

An object containing HTTP cache-control directives (e.g., max-age, etc.).

resource[].externalReference.maxLoadFailures
boolean

When true, allow transcription to proceed even if resource loading fails.

resource[].externalReference.requestTimeoutMs
number

Time to wait when downloading resources.

resource[].externalReference.type
string

Resource type: 'undefined_resource_type', 'wordset', 'compiled_wordset', 'domain_lm',
'speaker_profile', 'grammar', 'settings'.

resource[].externalReference.uri
string

Location of the resource as a URN reference.

resource[].inlineGrammar
string

Inline grammar in SRGS XML format.

resource[].inlineWordset
object

Inline wordset JSON resource.
See Wordsets for details.

resource[].reuse
string

Whether the resource will be used multiple times.
Allowed values: 'undefined_reuse', 'low_reuse', 'high_reuse'.
Default: low_reuse.

resource[].weightName
string

Input field setting the weight of the
domain LM or built-in resource relative to the data pack.
Allowed values: 'defaultWeight', 'lowest', 'low', 'medium', 'high', 'highest'.
Default: MEDIUM.

resource[].weightValue
number

Weight of the DLM or built-in resource as a numeric value from 0 to 1.
Default: 0.25.

resource[].wakeupWord
array

Array of wakeup words.

resultType
string

The level of recognition results: 'final', 'partial', 'immutable_partial'.
Default: final.

secret
string

Nuance secret to authenticate with (overrides setting in Jambonz portal).

speechDetectionSensitivity
number

A balance between detecting speech and noise (breathing, etc.), ranging from 0 to 1.
0 means ignore all noise, 1 means interpret all noise as speech.
Default: 0.5.

speechDomain
string

Mapping to internal weight sets for language models in the data pack.

suppressCallRecording
boolean

Whether to disable call logging and audio capture.
By default, call logs, audio, and metadata are collected.

suppressInitialCapitalization
boolean

When true, the first word in a sentence is not automatically capitalized.

topic
string

Specialized language model.

utteranceDetectionMode
string

How many sentences (utterances) within the audio stream are processed.
Allowed values: 'single', 'multiple', 'disabled'.
Default: single.

utteranceEndSilenceMs
number

Minimum silence (in milliseconds) that determines the end of a sentence.

userId
string

Identifies a specific user within the application.

customConfiguration
object

An object of key-value pairs that can be sent to Nvidia for custom configuration.

maxAlternatives
number

Number of alternative transcripts to return.

profanityFilter
boolean

Indicates whether to remove profanity from the transcript.

punctuation
boolean

Indicates whether to provide punctuation in the transcripts.

rivaUri
string

GRPC endpoint (ip:port) that Nvidia Riva is listening on.

verbatimTranscripts
boolean

Indicates whether to provide verbatim transcripts.

wordTimeOffsets
boolean

Indicates whether to provide word-level detail.

api_key
string

Soniox API key.

model
string

Soniox model to use.
Default: 'precision_ivr'.

profanityFilter
boolean

Indicates whether to remove profanity from the transcript.

storage
object

Properties that dictate whether to store audio and/or transcripts.
Can be useful for debugging purposes.

storage.disableSearch
booleanDefaults to false

If true, do not allow search.

storage.disableStoreAudio
booleanDefaults to false

If true, do not store audio.

storage.disableStoreTranscript
booleanDefaults to false

If true, do not store transcripts.

storage.id
string

Storage identifier.

storage.title
string

Storage title.

sm_audioEventsConfig
object

Audio events to report.

sm_audioEventsConfig.types
arrayRequired

“applause”, “laughter”, or “music”

transcription_config
object

Audio transcription configuration.

transcription_config.additional_vocab
array

Additional vocabulary words.

transcription_config.audio_filtering_config
object

Audio filtering configuration.

transcription_config.audio_filtering_config.volume_threshold
number
transcription_config.diarization
string
transcription_config.domain
array
transcription_config.enable_entities
boolean
transcription_config.enable_partials
boolean

Enable partial transcriptions.

transcription_config.language
string

Language to transcribe.

transcription_config.max_delay
number
transcription_config.max_delay_mode
string

“fixed” or “flexible”

transcription_config.output_locale
string
transcription_config.operating_point
string
transcription_config.punctuation_overrides
object

Punctuation configuration

transcription_config.punctuation_overrides.permitted_marks
array
transcription_config.punctuation_overrides.sensitivity
number
sm_audioFilteringConfig
object

Audio filtering configuration.

sm_audioFilteringConfig.volume_threshold
numberRequired

Volume threshold to filter.

Providing speech hints

Many recognizers support the ability to provide a dynamic list of words or phrases that should be “boosted” by the recognizer, i.e. the recognizer should be more likely to detect this terms and return them in the transcript. A boost factor can also be applied. In the most basic implementation it would look like this:

1"hints": ["benign", "malignant", "biopsy"],
2"hintsBoost": 50

Additionally, google and nvidia allow a boost factor to be specified at the phrase level, e.g.

1"hints": [
2 {"phrase": "benign", "boost": 50},
3 {"phrase": "malignant", "boost": 10},
4 {"phrase": "biopsy", "boost": 20},
5]