Gather

1 {
2   "verb": "gather",
3   "actionHook": "/collect",
4   "input": ["digits", "speech"],
5   "bargein": true,
6   "dtmfBargein": true,
7   "finishOnKey": "#",
8   "numDigits": 5,
9   "timeout": 8,
10   "recognizer": {
11     "vendor": "google",
12     "language": "en-US",
13     "hints": ["sales", "support"],
14     "hintsBoost": 10
15   },
16   "say": {
17     "text": "To speak to Sales press 1 or say Sales.  To speak to customer support press 2 or say Support"
18   }
19 }

When collecting speech input, the default recognizer for the application will be used unless you overridde it by specifying a recognizer property.

Parameters

actionHook

stringRequired

Webhook or websocket URL to send the collected digits or speech to; may be either an absolute or relative URL (see note below).
In the case of webhooks, this will be sent in a POST request.
See below for payload details.

actionHookDelayAction

object

An object describing behaviors to apply if the webhook or websocket application delays in responding to the actionHook.

bargein

boolean

Boolean indicating whether to allow barge-in when collecting speech (i.e., kill audio playback if the caller begins speaking).

dtmfBargein

boolean

Boolean indicating whether to allow barge-in using DTMF keys.

fillerNoise

objectDeprecated

An object describing audio to play to the caller if the webhook or websocket application delays in responding to the actionHook.
(Deprecated in favor of actionHookDelay).

fillerNoise.enable

booleanDeprecatedRequired

Whether to enable or disable filler noise.

fillerNoise.startDelaySecs

numberDeprecatedRequired

Integer value specifying the number of seconds to wait for a response from the remote application.

fillerNoise.url

stringDeprecatedRequired

HTTP(S) URL to audio to play as filler noise.

finishOnKey

string

A DTMF key that, if received, indicates the end of DTMF input.

input

arrayDefaults to ['digits']

Array specifying allowed types of input: ['digits'], ['speech'], or ['digits', 'speech'].

interDigitTimeout

number

Amount of time in seconds to wait between digits after minDigits have been entered.

listenDuringPrompt

booleanDefaults to true

If false, do not listen for user speech until the nested say or play command has completed.

maxDigits

number

Maximum number of DTMF digits expected to gather.

minBargeinWordCount

numberDefaults to 1

If bargein is true, only kill speech when this many words are spoken OR a final transcription is returned.

minDigits

numberDefaults to 1

Minimum number of DTMF digits expected to gather.

numDigits

number

Exact number of DTMF digits expected to gather.

partialResultHook

string

Webhook to send interim transcription results to.
Partial transcriptions are only generated if this property is set.

play

object

Nested play verb that can be used to prompt the user with an audio file.

recognizer

object

Speech recognition options to override default settings if desired.
See recognizer.

say

object

Nested say command that can be used to prompt the user with text-to-speech.

The actionHook property may be either an absolute URL or a relative URL. If it is a relative URL, it will be resolved relative to the base URL of the jambonz application. Generally speaking, you will typically use a relative URL for the actionHook property, and when using websockets you are strongly encouragedto use a relative URL.

actionHook properties

The payload sent to the actionHook URL will always include a reason property with one of the following values:

speechDetected - user speech was collected
dtmfDetected - user dtmf entries were collected
stt-low-confidence - user speech was collected but the confidence level was below the threshold
timeout - neither speech nor dtmf was collected before the timeout expired
error - an error occurred during the gather operation, possibly with the speech recognizer

Depending on the reason, additional properties may be included in the payload.

speechDetected

A speech object will be included in the payload. The speech object will include the following properties:

language_code - the detected language that was spoken
is_final - a boolean indicating whether this was a final or interim transcription
channel_tag - a value of 1 if the transcription is from the caller, or 2 if from the callee
alternatives - an array of alternative transcriptions, each with a transcript and confidence value
vendor - the raw payload returned from the speech vendor

Example:

1 {
2   "language_code": "en",
3   "channel_tag": 1,
4   "is_final": true,
5   "alternatives": [
6     {
7       "confidence": 0.9848633,
8       "transcript": "yes sorry setting up wi fi calling"
9     }
10   ],
11   "vendor": {
12     "name": "deepgram",
13     "evt": {
14       "type": "Results",
15       "channel_index": [
16         0,
17         1
18       ],
19       "duration": 4.1899986,
20       "start": 47.74,
21       "is_final": true,
22       "speech_final": true,
23       "channel": {
24         "alternatives": [
25           {
26             "transcript": "yes sorry setting up a wi fi calling",
27             "confidence": 0.9848633,
28             "words": [
29               {
30                   "word": "yes",
31                   "start": 48.659542,
32                   "end": 48.939404,
33                   "confidence": 0.82470703
34               },
35               {
36                   "word": "sorry",
37                   "start": 48.939404,
38                   "end": 49.259243,
39                   "confidence": 0.98535156
40               },
41               {
42                   "word": "setting",
43                   "start": 49.259243,
44                   "end": 49.61906,
45                   "confidence": 0.9941406
46               },
47               {
48                   "word": "up",
49                   "start": 49.61906,
50                   "end": 49.85894,
51                   "confidence": 0.9848633
52               },
53               {
54                   "word": "a",
55                   "start": 49.85894,
56                   "end": 49.97888,
57                   "confidence": 0.94921875
58               },
59               {
60                   "word": "wi",
61                   "start": 49.97888,
62                   "end": 50.178783,
63                   "confidence": 0.80371094
64               },
65               {
66                   "word": "fi",
67                   "start": 50.178783,
68                   "end": 50.418663,
69                   "confidence": 0.99902344
70               },
71               {
72                   "word": "calling",
73                   "start": 50.418663,
74                   "end": 50.918663,
75                   "confidence": 0.9326172
76               }
77             ]
78           }
79         ]
80       },
81       "metadata": {
82         "request_id": "b1bf1f58-ce74-42be-8b43-37f2f8331e72",
83         "model_info": {
84           "name": "phonecall-enhanced",
85           "version": "2022-05-12.1",
86           "arch": "polaris"
87         },
88         "model_uuid": "9a15c1cc-65e1-429a-9db6-f4ea6fbf822a"
89       },
90       "from_finalize": false
91     }
92   }
93 }

dtmfDetected

A digits property will be included in the payload containing the digits that were collected.

stt-low-confidence

A speech object will be included, see speechDetectedfor an example. The reason stt-low-confidence indicates to the application that this transcript should be treated with caution as it is likely to be an inaccurate reporting of what the user actually said.

timeout

No additional properties will be included in the payload.

error

A details property will be included in the payload with a description of the error.