Gather

Collect dtmf or speech input

1{
2 "verb": "gather",
3 "actionHook": "/collect",
4 "input": ["digits", "speech"],
5 "bargein": true,
6 "dtmfBargein": true,
7 "finishOnKey": "#",
8 "numDigits": 5,
9 "timeout": 8,
10 "recognizer": {
11 "vendor": "google",
12 "language": "en-US",
13 "hints": ["sales", "support"],
14 "hintsBoost": 10
15 },
16 "say": {
17 "text": "To speak to Sales press 1 or say Sales. To speak to customer support press 2 or say Support"
18 }
19}

When collecting speech input, the default recognizer for the application will be used unless you overridde it by specifying a recognizer property.

Parameters

actionHook
stringRequired

Webhook or websocket URL to send the collected digits or speech to; may be either an absolute or relative URL (see note below).
In the case of webhooks, this will be sent in a POST request.
See below for payload details.

actionHookDelayAction
object

An object describing behaviors to apply if the webhook or websocket application delays in responding to the actionHook.

bargein
boolean

Boolean indicating whether to allow barge-in when collecting speech (i.e., kill audio playback if the caller begins speaking).

dtmfBargein
boolean

Boolean indicating whether to allow barge-in using DTMF keys.

fillerNoise
objectDeprecated

An object describing audio to play to the caller if the webhook or websocket application delays in responding to the actionHook.
(Deprecated in favor of actionHookDelay).

fillerNoise.enable
booleanDeprecatedRequired

Whether to enable or disable filler noise.

fillerNoise.startDelaySecs
numberDeprecatedRequired

Integer value specifying the number of seconds to wait for a response from the remote application.

fillerNoise.url
stringDeprecatedRequired

HTTP(S) URL to audio to play as filler noise.

finishOnKey
string

A DTMF key that, if received, indicates the end of DTMF input.

input
arrayDefaults to ['digits']

Array specifying allowed types of input: ['digits'], ['speech'], or ['digits', 'speech'].

interDigitTimeout
number

Amount of time in seconds to wait between digits after minDigits have been entered.

listenDuringPrompt
booleanDefaults to true

If false, do not listen for user speech until the nested say or play command has completed.

maxDigits
number

Maximum number of DTMF digits expected to gather.

minBargeinWordCount
numberDefaults to 1

If bargein is true, only kill speech when this many words are spoken OR a final transcription is returned.

minDigits
numberDefaults to 1

Minimum number of DTMF digits expected to gather.

numDigits
number

Exact number of DTMF digits expected to gather.

partialResultHook
string

Webhook to send interim transcription results to.
Partial transcriptions are only generated if this property is set.

play
object

Nested play verb that can be used to prompt the user with an audio file.

recognizer
object

Speech recognition options to override default settings if desired.
See recognizer.

say
object

Nested say command that can be used to prompt the user with text-to-speech.

The actionHook property may be either an absolute URL or a relative URL. If it is a relative URL, it will be resolved relative to the base URL of the jambonz application. Generally speaking, you will typically use a relative URL for the actionHook property, and when using websockets you are strongly encouragedto use a relative URL.

actionHook properties

The payload sent to the actionHook URL will always include a reason property with one of the following values:

  • speechDetected - user speech was collected
  • dtmfDetected - user dtmf entries were collected
  • stt-low-confidence - user speech was collected but the confidence level was below the threshold
  • timeout - neither speech nor dtmf was collected before the timeout expired
  • error - an error occurred during the gather operation, possibly with the speech recognizer

Depending on the reason, additional properties may be included in the payload.

A speech object will be included in the payload. The speech object will include the following properties:

  • language_code - the detected language that was spoken
  • is_final - a boolean indicating whether this was a final or interim transcription
  • channel_tag - a value of 1 if the transcription is from the caller, or 2 if from the callee
  • alternatives - an array of alternative transcriptions, each with a transcript and confidence value
  • vendor - the raw payload returned from the speech vendor

Example:

1{
2 "language_code": "en",
3 "channel_tag": 1,
4 "is_final": true,
5 "alternatives": [
6 {
7 "confidence": 0.9848633,
8 "transcript": "yes sorry setting up wi fi calling"
9 }
10 ],
11 "vendor": {
12 "name": "deepgram",
13 "evt": {
14 "type": "Results",
15 "channel_index": [
16 0,
17 1
18 ],
19 "duration": 4.1899986,
20 "start": 47.74,
21 "is_final": true,
22 "speech_final": true,
23 "channel": {
24 "alternatives": [
25 {
26 "transcript": "yes sorry setting up a wi fi calling",
27 "confidence": 0.9848633,
28 "words": [
29 {
30 "word": "yes",
31 "start": 48.659542,
32 "end": 48.939404,
33 "confidence": 0.82470703
34 },
35 {
36 "word": "sorry",
37 "start": 48.939404,
38 "end": 49.259243,
39 "confidence": 0.98535156
40 },
41 {
42 "word": "setting",
43 "start": 49.259243,
44 "end": 49.61906,
45 "confidence": 0.9941406
46 },
47 {
48 "word": "up",
49 "start": 49.61906,
50 "end": 49.85894,
51 "confidence": 0.9848633
52 },
53 {
54 "word": "a",
55 "start": 49.85894,
56 "end": 49.97888,
57 "confidence": 0.94921875
58 },
59 {
60 "word": "wi",
61 "start": 49.97888,
62 "end": 50.178783,
63 "confidence": 0.80371094
64 },
65 {
66 "word": "fi",
67 "start": 50.178783,
68 "end": 50.418663,
69 "confidence": 0.99902344
70 },
71 {
72 "word": "calling",
73 "start": 50.418663,
74 "end": 50.918663,
75 "confidence": 0.9326172
76 }
77 ]
78 }
79 ]
80 },
81 "metadata": {
82 "request_id": "b1bf1f58-ce74-42be-8b43-37f2f8331e72",
83 "model_info": {
84 "name": "phonecall-enhanced",
85 "version": "2022-05-12.1",
86 "arch": "polaris"
87 },
88 "model_uuid": "9a15c1cc-65e1-429a-9db6-f4ea6fbf822a"
89 },
90 "from_finalize": false
91 }
92 }
93}

A digits property will be included in the payload containing the digits that were collected.

A speech object will be included, see speechDetectedfor an example. The reason stt-low-confidence indicates to the application that this transcript should be treated with caution as it is likely to be an inaccurate reporting of what the user actually said.

No additional properties will be included in the payload.

A details property will be included in the payload with a description of the error.