AssemblyAI Voice Agent

Using jambonz to connect custom telephony to AssemblyAI's Voice Agent API

The jambonz application referenced in this article can be found here.

This is an example jambonz application that connects to the AssemblyAI Voice Agent API and illustrates how to build a voice-AI application using jambonz and AssemblyAI. The application uses an open-meteo REST API to enable the agent to answer callers’ questions about the weather for specified locations.

Authentication

You’ll need an AssemblyAI API key with Voice Agent access. Configure it as a jambonz application environment variable in the portal (not via process.env):

VariableRequiredDescription
ASSEMBLYAI_API_KEYyesAssemblyAI API key (sent as Authorization: Bearer … on the voice-agent websocket). Mark as obscured.

The example application declares this variable via the SDK’s envVars option on createEndpoint, and reads it at call time from session.data.env_vars.ASSEMBLYAI_API_KEY. See Application Environment Variables in the Node.js SDK guide for the declaration pattern.

Configuring the Assistant

AssemblyAI’s protocol requires a session.update message to be sent before the agent will accept audio. The jambonz llm verb sends this automatically using whatever you pass in llmOptions — system prompt, greeting, output voice, input biasing, turn-detection thresholds, and tools.

llmOptions is the AssemblyAI session.update.session payload passed through verbatim:

1llmOptions: {
2 system_prompt: 'You are a helpful voice agent. Help callers get the weather for a city they ask about.',
3 greeting: 'Hello, how can I help you today?',
4 output: { voice: 'ivy' },
5 input: {
6 keyterms: ['weather', 'temperature', 'celsius', 'fahrenheit'],
7 turn_detection: {
8 vad_threshold: 0.5,
9 min_silence: 1000,
10 max_silence: 3000,
11 interrupt_response: true
12 }
13 },
14 tools: [ /* ... */ ]
15}

Audio format is not configurable. AssemblyAI Voice Agent only supports audio/pcm at 24 kHz, which jambonz uses unconditionally. The input.format and output.format keys are overridden by jambonz before the message is sent to AssemblyAI. jambonz resamples to/from the channel’s native rate automatically.

For the full list of session fields (voices, keyterm biasing, turn-detection knobs, etc.), refer to the AssemblyAI events reference.

Tool calls

The example application registers a getWeather tool that the agent can invoke to answer weather questions. Each tool entry must use AssemblyAI’s flat format:

1{
2 type: 'function',
3 name: 'getWeather',
4 description: 'Get current weather for a given city',
5 parameters: {
6 type: 'object',
7 properties: {
8 location: { type: 'string', description: 'City name' },
9 scale: { type: 'string', enum: ['celsius', 'fahrenheit'] }
10 },
11 required: ['location']
12 }
13}

When the agent decides to invoke a tool, jambonz fires a tool.call event and routes it to the configured toolHook. The handler replies via session.sendToolOutput(tool_call_id, {type: 'tool.result', tool_call_id, result}). The result field should be a string the model can read — jambonz JSON-stringifies non-string values automatically.

See AssemblyAI tool calling for the underlying protocol.

actionHook properties

Like many jambonz verbs, the llm verb sends an actionHook with a final status when the verb completes. The payload includes a completion_reason property indicating why the session ended. Possible values are:

  • normal conversation end
  • connection failure
  • disconnect from remote end
  • server error
  • client error calling function
  • client error calling mcp function

Resources