The jambonz application referenced in this article can be found here.
This is an example jambonz application that connects to the Google Gemini Live API and illustrates how to build a Voice-AI application using jambonz and Google Gemini.
The example covers:
Calling webhook and Call status webhook at your server:
To run with MCP tools, open two terminals:
Call your virtual number and ask Barbara about the weather.
llm verb is wired upThe application calls session.llm({...}) with vendor: 'google' and a Gemini Live model. The llmOptions.setup object is forwarded verbatim to Google’s BidiGenerateContentSetup message:
See the full route in lib/routes/weather-agent.js.
For outbound calls — or any scenario where you want Gemini to speak first — add a greeting to llmOptions. jambonz sends it immediately after setup so the caller hears the agent within the first second:
The value is an instruction to the model, not the literal greeting text. Use "Say exactly: Hello, thank you for calling Acme." if you need a scripted line.
This also works on models/gemini-3.1-flash-live-preview. On the 3.1 preview, Google restricted clientContent to seeding history only, so jambonz uses realtimeInput.text under the hood — the greeting field is the portable way to trigger a first turn across all Gemini Live models.
Gemini Live sessions can be resumed across websocket reconnects. Opt in by passing sessionResumption: {} in llmOptions. Each llm_event hook delivers a sessionResumptionUpdate containing a fresh newHandle — store the latest handle, then reconnect with sessionResumption: { handle: '<stored handle>' } to continue the conversation.
The toolHook fires when Gemini wants to call one of the declared functions. Respond with session.sendToolOutput:
Gemini’s native tool format uses functionCalls (inbound) and functionResponses (outbound) — jambonz passes them through without reshaping, so the payloads match the Gemini Live tool use docs exactly.
When the caller speaks over Gemini, the module emits output_audio.playback_stopped with completion_reason: "interrupted" on the event hook, and the queued audio is discarded so the caller hears their own voice, not stale agent audio. No application code is required — interruption handling is built in.
Like every jambonz verb, the llm verb fires actionHook when the session ends, including a completion_reason: