Python SDK

Experimental: Build jambonz voice applications with jambonz-python-sdk

The jambonz-python-sdk package lets you build jambonz voice applications in Python. It supports webhook and WebSocket transports, a REST API client for mid-call control, and chainable verb methods for building call flows.

Source code: github.com/jambonz/python-sdk

Experimental — This SDK is under active development. APIs may change between releases. Please report issues on GitHub.

Installation

$pip install jambonz-python-sdk

Imports

The SDK provides three submodule imports:

1# Webhook apps (aiohttp, FastAPI, Flask, etc.)
2from jambonz_sdk.webhook import WebhookResponse
3
4# WebSocket apps
5from jambonz_sdk.websocket import create_endpoint
6
7# REST API client (mid-call control, outbound calls)
8from jambonz_sdk.client import JambonzClient

Webhook Transport

Use WebhookResponse to build verb arrays in response to HTTP webhooks. Methods are chainable and the response is serialized to JSON.

1from aiohttp import web
2from jambonz_sdk.webhook import WebhookResponse
3
4async def handle_incoming(request: web.Request) -> web.Response:
5 jambonz = WebhookResponse()
6 jambonz.say(text="Hello from jambonz!").gather(
7 input=["speech", "digits"],
8 actionHook="/handle-input",
9 say={"text": "Press 1 for sales or 2 for support."},
10 ).hangup()
11 return web.json_response(jambonz.to_json())
12
13async def handle_input(request: web.Request) -> web.Response:
14 body = await request.json()
15 speech = body.get("speech", {}).get("alternatives", [{}])[0].get("transcript", "")
16 jambonz = WebhookResponse()
17 jambonz.say(text=f"You said: {speech}").hangup()
18 return web.json_response(jambonz.to_json())
19
20app = web.Application()
21app.router.add_post("/incoming", handle_incoming)
22app.router.add_post("/handle-input", handle_input)
23web.run_app(app, port=3000)

WebSocket Transport

Use create_endpoint to build real-time WebSocket applications. This is the recommended transport for voice AI agents, as it enables bidirectional communication, event streaming, and mid-call updates.

1import asyncio
2from jambonz_sdk.websocket import create_endpoint
3
4async def main():
5 make_service, runner = await create_endpoint(port=3000)
6 svc = make_service(path="/")
7
8 async def handle_session(session):
9 async def on_gather_result(evt):
10 transcript = (
11 evt.get("speech", {})
12 .get("alternatives", [{}])[0]
13 .get("transcript", "")
14 )
15 session.say(text=f"You said: {transcript}").hangup()
16 await session.reply()
17
18 session.on("/gather-result", on_gather_result)
19
20 session.say(text="Hello! Say something.").gather(
21 input=["speech"],
22 actionHook="/gather-result",
23 timeout=10,
24 ).hangup()
25 await session.send()
26
27 svc.on("session:new", handle_session)
28 await asyncio.Future()
29
30asyncio.run(main())

.send() vs .reply()

  • await session.send() — Use once for the initial verb array in response to session:new.
  • await session.reply() — Use for all subsequent responses to actionHook events.

Application Environment Variables

You can declare environment variables that are configurable in the jambonz portal UI:

1make_service, runner = await create_endpoint(
2 port=3000,
3 env_vars={
4 "OPENAI_MODEL": {
5 "type": "string",
6 "description": "LLM model to use",
7 "default": "gpt-4.1-mini",
8 },
9 "SYSTEM_PROMPT": {
10 "type": "string",
11 "description": "System prompt",
12 "uiHint": "textarea",
13 "default": "You are a helpful assistant.",
14 },
15 },
16)
17
18# Read values in session handler
19async def handle_session(session):
20 model = session.data.get("env_vars", {}).get("OPENAI_MODEL", "gpt-4.1-mini")
21 # ...

Audio Streams

When using the listen verb, make_service.audio() lets you handle both call control and audio on the same server:

1svc = make_service(path="/")
2audio_svc = make_service.audio(path="/audio-stream")
3
4async def handle_session(session):
5 session.say(text="Listening...").listen(
6 url="/audio-stream",
7 sampleRate=8000,
8 bidirectionalAudio={"enabled": True, "streaming": True, "sampleRate": 8000},
9 )
10 await session.send()
11
12svc.on("session:new", handle_session)
13
14async def handle_audio(stream):
15 async def on_audio(pcm):
16 # Process audio — feed to STT, record, etc.
17 pass
18
19 stream.on("audio", on_audio)
20
21 # Send audio back
22 stream.send_audio(pcm_buffer)
23
24audio_svc.on("connection", handle_audio)

REST API Client

Use JambonzClient for outbound calls and mid-call control:

1from jambonz_sdk.client import JambonzClient
2
3async with JambonzClient(
4 base_url="https://api.jambonz.us",
5 account_sid="your-account-sid",
6 api_key="your-api-key",
7) as client:
8 # Create an outbound call
9 call_sid = await client.calls.create({
10 "from": "+15085551212",
11 "to": {"type": "phone", "number": "+15085551213"},
12 "call_hook": "/incoming",
13 })
14
15 # Mid-call control
16 await client.calls.mute(call_sid, "mute")
17 await client.calls.redirect(call_sid, "https://example.com/new-flow")

Verb Methods

Both WebhookResponse and WebSocket Session support the same chainable verb methods:

.say() .play() .gather() .dial() .llm() .agent() .conference() .enqueue() .dequeue() .hangup() .pause() .redirect() .config() .tag() .dtmf() .listen() .transcribe() .message() .dub() .alert() .answer() .leave() .sip_decline() .sip_refer() .sip_request()

All methods accept the same options as the corresponding verb JSON schemas and are chainable.

Spec-Driven Verb Generation

The SDK does not hardcode verb method signatures. Verb methods are auto-generated at import time from JSON Schema files — the same schemas used by the Node.js SDK and the jambonz server. When the schema adds a new property to a verb, the SDK picks it up automatically with no code change needed.

TTS Token Streaming

The WebSocket Session provides methods for incremental TTS token streaming, enabling low-latency voice AI experiences:

1async def on_llm_tokens(evt):
2 tokens = evt.get("tokens")
3 done = evt.get("done")
4
5 if tokens:
6 await session.send_tts_tokens(tokens)
7
8 if done:
9 session.flush_tts_tokens()
10
11session.on("/llm-tokens", on_llm_tokens)
MethodDescription
send_tts_tokens(text)Send a chunk of text for TTS. Awaitable; resolves when jambonz acknowledges receipt.
flush_tts_tokens()Signal the end of a TTS token stream.
clear_tts_tokens()Cancel all pending TTS tokens and reset state.

LLM and Agent Updates

Tool Output

When the LLM requests a tool/function call, respond with the result:

1async def on_tool_call(evt):
2 tool_call_id = evt["tool_call_id"]
3 name = evt["name"]
4 args = evt["arguments"]
5 result = await handle_tool(name, args)
6 session.send_tool_output(tool_call_id, result)
7
8session.on("/tool-call", on_tool_call)

Agent Updates

Send mid-conversation updates to an active agent:

1# Change the system prompt
2session.update_agent({
3 "type": "update_instructions",
4 "instructions": "You are now a billing agent.",
5})
6
7# Inject context
8session.update_agent({
9 "type": "inject_context",
10 "messages": [{"role": "user", "content": "Customer: Sarah, Gold tier."}],
11})
12
13# Replace tools
14session.update_agent({"type": "update_tools", "tools": [...]})
15
16# Trigger a new response
17session.update_agent({
18 "type": "generate_reply",
19 "interrupt": True,
20 "user_input": "Tell the customer about the flash sale.",
21})

LLM Updates

1session.update_llm({"instructions": "Switch to Spanish."})
MethodDescription
send_tool_output(tool_call_id, data)Send tool/function result back to the LLM or agent verb
update_agent(data)Send an agent:update command
update_llm(data)Send an llm:update command

Inject Commands

Inject commands execute immediately on an active call without affecting the verb stack:

1# Mute/unmute
2session.inject_mute("mute")
3session.inject_mute("unmute")
4
5# Whisper to one party
6session.inject_whisper({"verb": "say", "text": "The customer is a VIP."}, agent_call_sid)
7
8# Control noise isolation
9session.inject_noise_isolation("enable", vendor="krisp", level=80)
10session.inject_noise_isolation("disable")
11
12# Control recording
13session.inject_record("startCallRecording", siprec_server_url="sip:siprec@recorder.example.com")
14session.inject_record("pauseCallRecording")
15
16# Send DTMF
17session.inject_dtmf("1234")
18
19# Redirect call flow
20session.inject_redirect("/new-webhook")
21
22# Tag the call with metadata
23session.inject_tag({"priority": "high", "department": "billing"})

Session Properties

PropertyTypeDescription
call_sidstrUnique call identifier
from_numberstrCaller phone number or SIP URI
tostrCalled phone number or SIP URI
directionstr'inbound' or 'outbound'
account_sidstrAccount identifier
application_sidstrApplication identifier
call_idstrSIP Call-ID
datadictFull call session data (includes env_vars, SIP headers, etc.)

Examples

See the examples directory for runnable demos:

ExampleWebhookWebSocketDescription
hello-worldyesyesMinimal greeting
echoyesyesSpeech echo with gather
ivr-menuyesIVR menu with speech + DTMF
voice-agentyesyesLLM pipeline with tool calls
dialyesOutbound dial with fallback
listen-recordyesyesAudio recording