Google Gemini Speech to Speech
The jambonz application referenced in this article can be found here.
This is an example jambonz application that connects to the Google Gemini Live API and illustrates how to build a Voice-AI application using jambonz and Google Gemini.
The example covers:
- a weather agent using Gemini function calling
- an optional MCP server integration, so the same agent can expose its tools through the Model Context Protocol
Prerequisites
- a jambonz.cloud account (or a self-hosted jambonz deployment on 10.1.0 or later)
- a Google Cloud Platform account with the Gemini API enabled
- a carrier and virtual phone number of your choice
Running instructions
Set environment variables
jambonz setup
- Create a carrier entity in the jambonz portal.
- Add your speech provider of choice. Gemini Live handles speech-to-speech end to end, but jambonz still needs a speech credential configured on the account.
- Create a new jambonz application under the Applications tab. Point both the
Calling webhookandCall status webhookat your server: - Provision a phone number on your carrier and associate it with the application.
Run the app
To run with MCP tools, open two terminals:
Call your virtual number and ask Barbara about the weather.
How the llm verb is wired up
The application calls session.llm({...}) with vendor: 'google' and a Gemini Live model. The llmOptions.setup object is forwarded verbatim to Google’s BidiGenerateContentSetup message:
See the full route in lib/routes/weather-agent.js.
Proactive greeting (“speak first”)
For outbound calls — or any scenario where you want Gemini to speak first — add a greeting to llmOptions. jambonz sends it immediately after setup so the caller hears the agent within the first second:
The value is an instruction to the model, not the literal greeting text. Use "Say exactly: Hello, thank you for calling Acme." if you need a scripted line.
This also works on models/gemini-3.1-flash-live-preview. On the 3.1 preview, Google restricted clientContent to seeding history only, so jambonz uses realtimeInput.text under the hood — the greeting field is the portable way to trigger a first turn across all Gemini Live models.
Session resumption
Gemini Live sessions can be resumed across websocket reconnects. Opt in by passing sessionResumption: {} in llmOptions. Each llm_event hook delivers a sessionResumptionUpdate containing a fresh newHandle — store the latest handle, then reconnect with sessionResumption: { handle: '<stored handle>' } to continue the conversation.
Function calling
The toolHook fires when Gemini wants to call one of the declared functions. Respond with session.sendToolOutput:
Gemini’s native tool format uses functionCalls (inbound) and functionResponses (outbound) — jambonz passes them through without reshaping, so the payloads match the Gemini Live tool use docs exactly.
Interruption handling
When the caller speaks over Gemini, the module emits output_audio.playback_stopped with completion_reason: "interrupted" on the event hook, and the queued audio is discarded so the caller hears their own voice, not stale agent audio. No application code is required — interruption handling is built in.
A note on actionHook
Like every jambonz verb, the llm verb fires actionHook when the session ends, including a completion_reason:
- Normal conversation end
- Connection failure
- Disconnect from remote end
- Server failure
- Server error
Resources
- Google Gemini Live API documentation
- BidiGenerateContent protocol reference
- Example application source
- jambonz documentation: