Llm

Connect a call to AI language model.
1session.llm(
2{
3 vendor: 'openai',
4 model: "gpt-4o-realtime-preview-2024-10-01",
5 auth: {
6 apiKey
7 },
8 actionHook: '/final',
9 eventHook: '/event',
10 toolHook: '/toolCall',
11 events: [
12 'conversation.item.*',
13 'response.audio_transcript.done',
14 'input_audio_buffer.committed'
15 ],
16 llmOptions: {
17 response_create: {
18 modalities: ['text', 'audio'],
19 instructions: 'Please assist the user with their request.',
20 voice: 'alloy',
21 output_audio_format: 'pcm16',
22 temperature: 0.8,
23 max_output_tokens: 4096,
24 },
25 session_update: {
26 tools: [
27 {
28 name: 'get_weather',
29 type: 'function',
30 description: 'Get the weather at a given location',
31 parameters: {
32 type: 'object',
33 properties: {
34 location: {
35 type: 'string',
36 description: 'Location to get the weather from',
37 },
38 scale: {
39 type: 'string',
40 enum: ['fahrenheit', 'celsius'],
41 },
42 },
43 required: ['location', 'scale'],
44 },
45 },
46 ],
47 tool_choice: 'auto',
48 input_audio_transcription: {
49 model: 'whisper-1',
50 },
51 turn_detection: {
52 type: 'server_vad',
53 threshold: 0.8,
54 prefix_padding_ms: 300,
55 silence_duration_ms: 500,
56 }
57 }
58 }
59})

Parameters

model
stringRequired

Name of the LLM model.

vendor
stringRequired

Name of the LLM vendor.

actionHook
string

Webhook that will be called when the LLM session ends.

auth
object

Object containing authentication credentials; format according to the model.

connectOptions
object

Object containing information such as the URI to connect to.

eventHook
string

Webhook that will be called when a requested LLM event happens (e.g., transcript).

events
array

Array of event names listing the events requested (wildcards allowed).

llmOptions
object

Object containing instructions for the LLM; format dependent on the LLM model.

toolHook
string

Webhook that will be called when the LLM wants to call a function.

The following LLMs are currently supported:

  • OpenAI Realtime API
  • Deepgram Voice Agent
  • Ultravox
  • ElevenLabs
  • Google Gemini Live API
  • AssemblyAI Voice Agent

Google Gemini Live

Set vendor: 'google' and supply a Gemini Live model (for example models/gemini-2.0-flash-live-001 or models/gemini-3.1-flash-live-preview). llmOptions.setup is forwarded verbatim to Google’s BidiGenerateContentSetup message after the websocket connects.

1session.llm({
2 vendor: 'google',
3 model: 'models/gemini-2.0-flash-live-001',
4 auth: { apiKey: process.env.GEMINI_API_KEY },
5 actionHook: '/final',
6 eventHook: '/event',
7 toolHook: '/toolCall',
8 connectOptions: {
9 host: 'generativelanguage.googleapis.com',
10 version: 'v1beta'
11 },
12 llmOptions: {
13 setup: {
14 generationConfig: {
15 speechConfig: {
16 voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Aoede' } }
17 }
18 },
19 systemInstruction: {
20 parts: [{ text: 'You are a helpful assistant named Barbara.' }]
21 }
22 },
23 greeting: 'Greet the caller warmly and ask how you can help.',
24 sessionResumption: {}
25 }
26});

Google-specific llmOptions fields

setup
objectRequired

The BidiGenerateContentSetup object sent to Gemini right after the websocket connects. The model field is populated automatically from the verb’s model parameter. generationConfig.responseModalities is forced to audio.

greeting
string | object

Optional proactive greeting. When set, jambonz sends a text message to Gemini immediately after setup so the agent speaks first without waiting for the caller to speak. Accepts either a string or an object with a text field. The value is an instruction to the model, not the literal words — for example "Greet the caller warmly" rather than "Hello, how can I help?".

Implemented using realtimeInput.text so it works on both the 2.0 Live models and gemini-3.1-flash-live-preview. (On 3.1, clientContent is reserved for seeding history and does not trigger a model response, which is why realtimeInput.text is used.)

sessionResumption
object

Enable session resumption. Pass {} to opt in, or { handle: "..." } to resume a previous session. Resumption handles are delivered back to the application via llm_event sessionResumptionUpdate messages.

AssemblyAI Voice Agent

Set vendor: 'assemblyai' and supply your AssemblyAI API key via auth.api_key. llmOptions is the AssemblyAI Voice Agent session payload passed through verbatim — there is no jambonz-specific wrapper. jambonz wraps it as {type: 'session.update', session: <llmOptions>} and sends it as the first client message after the websocket connects to wss://agents.assemblyai.com/v1/ws. See the AssemblyAI Voice Agent product page and Voice Agent API docs for an overview.

The audio format is not configurable. AssemblyAI Voice Agent only accepts audio/pcm at 24 kHz, which jambonz uses unconditionally — session.input.format / session.output.format set by the application are overridden. jambonz resamples to/from the channel’s native rate automatically.

1session.llm({
2 vendor: 'assemblyai',
3 auth: { api_key: process.env.ASSEMBLYAI_API_KEY },
4 actionHook: '/final',
5 eventHook: '/event',
6 toolHook: '/toolCall',
7 events: ['all'],
8 llmOptions: {
9 system_prompt: 'You are a helpful voice agent.',
10 greeting: 'Hello, how can I help you today?',
11 output: { voice: 'ivy' },
12 input: {
13 keyterms: ['weather', 'temperature'],
14 turn_detection: {
15 vad_threshold: 0.5,
16 min_silence: 1000,
17 max_silence: 3000,
18 interrupt_response: true
19 }
20 },
21 tools: [
22 {
23 type: 'function',
24 name: 'getWeather',
25 description: 'Get current weather for a given city',
26 parameters: {
27 type: 'object',
28 properties: {
29 location: { type: 'string', description: 'City name' },
30 scale: { type: 'string', enum: ['celsius', 'fahrenheit'] }
31 },
32 required: ['location']
33 }
34 }
35 ]
36 }
37});

AssemblyAI-specific auth fields

api_key
stringRequired

Your AssemblyAI API key. Sent as Authorization: Bearer <api_key> on the WebSocket handshake.

AssemblyAI-specific llmOptions fields

AssemblyAI’s protocol requires a session.update message, but every field inside is optional — pass llmOptions: {} to start with all server defaults.

system_prompt
string

System prompt for the agent.

greeting
string

Initial greeting the agent will speak when the session opens.

output
object

Output audio configuration. Supports voice — see the AssemblyAI voices reference for available IDs. The format sub-field is overridden by jambonz.

input
object

Input audio configuration. Supports keyterms (array of biasing terms) and turn_detection (vad_threshold, min_silence, max_silence, interrupt_response). The format sub-field is overridden by jambonz.

tools
array

Array of tool definitions. Each entry must include type: "function", name, description, and parameters (JSON Schema). jambonz auto-fills type: "function" if omitted.

Tool calls

The agent invokes a tool by emitting a tool.call server event. jambonz routes it to the application’s toolHook with {name, args, tool_call_id}. The application replies via session.sendToolOutput(tool_call_id, {type: 'tool.result', tool_call_id, result}). The result should be a string (JSON-stringify objects before sending) — jambonz JSON-stringifies non-string result values automatically.

Example Applications

Please checkout the following example applications: