Answering machine detection

Detects whether a call has been answered by a person or a machine.

The answering machine detection (amd) feature can be enabled on either outbound or inbound calls to provide an indication of whether a call has been answered by a person or a machine.

The reason for supporting amd on inbound calls is that some dialers will place an outbound call and then connect it to jambonz by sending an INVITE to jambonz. While jambonz sees this as an incoming call, you may still want to peform answering machine detection on the call. You can do this by having your application use a config verb with an amd property.

1{
2 "verb": "dial",
3 "actionHook": "/outdial",
4 "callerId": "+16173331212",
5 "target": [
6 {
7 "type": "phone",
8 "number": "+15083084809",
9 "trunk": "Twilio"
10 }
11 ],
12 "amd": {
13 "actionHook": "/amdEvents"
14 }
15}

In the example above when the dialed call is answered the answering machine detection feature will begin listening on the outbound call leg and after a short period of time will send a webhook to ‘/amdEvents’ with an indication of whether a human or a machine has answered the call.

Parameters

actionHook
stringRequired

Webhook to send AMD events.

recognizer
object

Speech recognizer to use, if you want to override the application default settings.

thresholdWordCount
numberDefaults to 9

Number of spoken words in a greeting that result in an amd_machine_detected result.

timers
object

Object containing various timeouts.

timers.decisionTimeoutMs
numberDefaults to 15000

Time in milliseconds to wait before returning amd_decision_timeout.

timers.greetingCompletionTimeoutMs
numberDefaults to 2000

Silence in milliseconds to wait for during greeting before returning amd_machine_stopped_speaking.

timers.noSpeechTimeoutMs
numberDefaults to 5000

Time in milliseconds to wait for speech before returning amd_no_speech_detected.

timers.toneTimeoutMs
numberDefaults to 20000

Time in milliseconds to wait to hear a tone.

actionHook properties

The payload in the webhook will look something like this:

1{
2 "type": "amd_human_detected"
3}

or

1{
2 "type": "amd_machine_detected",
3 "reason": "hint",
4 "hint": "call has been forwarded",
5 "language": "en-us"
6}

If no speech is detected at all from the far end, the payload will look like this:

1{
2 "type": "amd_no_speech_detected"
3}

And, finally, if the answering machine detection feature is unable to determine whether the remote party is a machine or human it will return

1{
2 "type": "amd_decision_timeout"
3}

The type property can have the following values:

typemeaningadditional properties
amd_human_detecteda human is speakingreason, greeting, language
amd_machine_detecteda machine is speakingreason, hint, transcript, language
amd_no_speech_detectedno speech was detectednone
amd_decision_timeoutno decision was able to be made in the time givennone
amd_machine_stopped_speakingmachine has completed the greetingnone
amd_tone_detecteda beep was detectednone
amd_erroran error has occurrederror
amd_stoppedanswering machine detection was stoppednone

Additional properties:

namedescription
reasondescribes why the decision was made (e.g. “short greeting”)
greetingthe text of the greeting that was detected
hintthe hint that was matched in the greeting
transcriptthe transcript that was gathered
languagethe language that was detected

It is possible to receive more than one event for a single call. For instance, a possible sequence of events on a call to an answering machine is:

  1. amd_machine_detected, then
  2. amd_tone_detected, then
  3. amd_machine_stopped_speaking

The application receiving the webhook can optionally return a new jambonz payload of verbs to change the call flow based on the result of the answering machine detection.

For instance, say you had an outbound dialer application and you want to deliver a message when you connect to a person, but otherwise hang up. In that case, your app can respond to the webhook with a simple hangup verb if you receive an event payload indicating a machine has answered.

Length of greeting

The answering machine detection feature leverages the fact that voicemail greetings are typically quite a bit more lengthy than a human’s greeting. When the call is answered, speech recognition is used to determine the length of the greeting and if it is shorter than a (configurable) threshold, it is determined to be human; if longer then it is determined to be a machine.

Key Voicemail phrases

Optionally, the feature can also given a list of common phrases that one might hear on a voicemail greeting. If any of the phrases are detected then the determination is immediately made that this is a machine. These phrases are supplied via an external file, so they can be easily updated as needed for a specific deployment. As example, here are sample phrases that might be used for an english language greeting:

1{
2 "en-US": [
3 "call has been forwarded",
4 "at the beep",
5 "at the tone",
6 "leave a message",
7 "leave me a message",
8 "not available right now",
9 "not available to take your call",
10 "can't take your call",
11 "I will get back to you",
12 "I'll get back to you",
13 "we will get back to you",
14 "we are unable",
15 "we are not available"
16 ]
17}

The current set of phrases is defined here. Feel free to make a pull request against that repo to suggest additional phrases or languages that should be added.

Beep detection

The feature also attempts to detect audio tones, or beeps that are commonly used on voicemail systems. If detected, an event is sent via the actionHookf to the webapp.

Determining when to leave a voicemail

For an application that wishes to leave a message on a voicemail system, it is necessary to know when the voicemail greeting has completed and the voicemail system is now ready to record the message. If the feature determines that the remote party is a machine, it will continue listening to the greeting until it completes and then send an event via the actionHook to the application. This can be used as a cue to let the application know that it is time to start leaving the message.

Built with