Python SDK

The jambonz-python-sdk package lets you build jambonz voice applications in Python. It supports webhook and WebSocket transports, a REST API client for mid-call control, and chainable verb methods for building call flows.

Source code: github.com/jambonz/python-sdk

Experimental — This SDK is under active development. APIs may change between releases. Please report issues on GitHub.

Installation

$ pip install jambonz-python-sdk

Imports

The SDK provides three submodule imports:

1 # Webhook apps (aiohttp, FastAPI, Flask, etc.)
2 from jambonz_sdk.webhook import WebhookResponse
3 
4 # WebSocket apps
5 from jambonz_sdk.websocket import create_endpoint
6 
7 # REST API client (mid-call control, outbound calls)
8 from jambonz_sdk.client import JambonzClient

Webhook Transport

Use WebhookResponse to build verb arrays in response to HTTP webhooks. Methods are chainable and the response is serialized to JSON.

1 from aiohttp import web
2 from jambonz_sdk.webhook import WebhookResponse
3 
4 async def handle_incoming(request: web.Request) -> web.Response:
5     jambonz = WebhookResponse()
6     jambonz.say(text="Hello from jambonz!").gather(
7         input=["speech", "digits"],
8         actionHook="/handle-input",
9         say={"text": "Press 1 for sales or 2 for support."},
10     ).hangup()
11     return web.json_response(jambonz.to_json())
12 
13 async def handle_input(request: web.Request) -> web.Response:
14     body = await request.json()
15     speech = body.get("speech", {}).get("alternatives", [{}])[0].get("transcript", "")
16     jambonz = WebhookResponse()
17     jambonz.say(text=f"You said: {speech}").hangup()
18     return web.json_response(jambonz.to_json())
19 
20 app = web.Application()
21 app.router.add_post("/incoming", handle_incoming)
22 app.router.add_post("/handle-input", handle_input)
23 web.run_app(app, port=3000)

WebSocket Transport

Use create_endpoint to build real-time WebSocket applications. This is the recommended transport for voice AI agents, as it enables bidirectional communication, event streaming, and mid-call updates.

1 import asyncio
2 from jambonz_sdk.websocket import create_endpoint
3 
4 async def main():
5     make_service, runner = await create_endpoint(port=3000)
6     svc = make_service(path="/")
7 
8     async def handle_session(session):
9         async def on_gather_result(evt):
10             transcript = (
11                 evt.get("speech", {})
12                 .get("alternatives", [{}])[0]
13                 .get("transcript", "")
14             )
15             session.say(text=f"You said: {transcript}").hangup()
16             await session.reply()
17 
18         session.on("/gather-result", on_gather_result)
19 
20         session.say(text="Hello! Say something.").gather(
21             input=["speech"],
22             actionHook="/gather-result",
23             timeout=10,
24         ).hangup()
25         await session.send()
26 
27     svc.on("session:new", handle_session)
28     await asyncio.Future()
29 
30 asyncio.run(main())

.send() vs .reply()

await session.send() — Use once for the initial verb array in response to session:new.
await session.reply() — Use for all subsequent responses to actionHook events.

Application Environment Variables

You can declare environment variables that are configurable in the jambonz portal UI:

1 make_service, runner = await create_endpoint(
2     port=3000,
3     env_vars={
4         "OPENAI_MODEL": {
5             "type": "string",
6             "description": "LLM model to use",
7             "default": "gpt-4.1-mini",
8         },
9         "SYSTEM_PROMPT": {
10             "type": "string",
11             "description": "System prompt",
12             "uiHint": "textarea",
13             "default": "You are a helpful assistant.",
14         },
15     },
16 )
17 
18 # Read values in session handler
19 async def handle_session(session):
20     model = session.data.get("env_vars", {}).get("OPENAI_MODEL", "gpt-4.1-mini")
21     # ...

Audio Streams

When using the listen verb, make_service.audio() lets you handle both call control and audio on the same server:

1 svc = make_service(path="/")
2 audio_svc = make_service.audio(path="/audio-stream")
3 
4 async def handle_session(session):
5     session.say(text="Listening...").listen(
6         url="/audio-stream",
7         sampleRate=8000,
8         bidirectionalAudio={"enabled": True, "streaming": True, "sampleRate": 8000},
9     )
10     await session.send()
11 
12 svc.on("session:new", handle_session)
13 
14 async def handle_audio(stream):
15     async def on_audio(pcm):
16         # Process audio — feed to STT, record, etc.
17         pass
18 
19     stream.on("audio", on_audio)
20 
21     # Send audio back
22     stream.send_audio(pcm_buffer)
23 
24 audio_svc.on("connection", handle_audio)

REST API Client

Use JambonzClient for outbound calls and mid-call control:

1 from jambonz_sdk.client import JambonzClient
2 
3 async with JambonzClient(
4     base_url="https://api.jambonz.us",
5     account_sid="your-account-sid",
6     api_key="your-api-key",
7 ) as client:
8     # Create an outbound call
9     call_sid = await client.calls.create({
10         "from": "+15085551212",
11         "to": {"type": "phone", "number": "+15085551213"},
12         "call_hook": "/incoming",
13     })
14 
15     # Mid-call control
16     await client.calls.mute(call_sid, "mute")
17     await client.calls.redirect(call_sid, "https://example.com/new-flow")

Verb Methods

Both WebhookResponse and WebSocket Session support the same chainable verb methods:

.say() .play() .gather() .dial() .llm() .agent() .conference() .enqueue() .dequeue() .hangup() .pause() .redirect() .config() .tag() .dtmf() .listen() .transcribe() .message() .dub() .alert() .answer() .leave() .sip_decline() .sip_refer() .sip_request()

All methods accept the same options as the corresponding verb JSON schemas and are chainable.

Spec-Driven Verb Generation

The SDK does not hardcode verb method signatures. Verb methods are auto-generated at import time from JSON Schema files — the same schemas used by the Node.js SDK and the jambonz server. When the schema adds a new property to a verb, the SDK picks it up automatically with no code change needed.

TTS Token Streaming

The WebSocket Session provides methods for incremental TTS token streaming, enabling low-latency voice AI experiences:

1 async def on_llm_tokens(evt):
2     tokens = evt.get("tokens")
3     done = evt.get("done")
4 
5     if tokens:
6         await session.send_tts_tokens(tokens)
7 
8     if done:
9         session.flush_tts_tokens()
10 
11 session.on("/llm-tokens", on_llm_tokens)

Method	Description
`send_tts_tokens(text)`	Send a chunk of text for TTS. Awaitable; resolves when jambonz acknowledges receipt.
`flush_tts_tokens()`	Signal the end of a TTS token stream.
`clear_tts_tokens()`	Cancel all pending TTS tokens and reset state.

LLM and Agent Updates

Tool Output

When the LLM requests a tool/function call, respond with the result:

1 async def on_tool_call(evt):
2     tool_call_id = evt["tool_call_id"]
3     name = evt["name"]
4     args = evt["arguments"]
5     result = await handle_tool(name, args)
6     session.send_tool_output(tool_call_id, result)
7 
8 session.on("/tool-call", on_tool_call)

Agent Updates

Send mid-conversation updates to an active agent:

1 # Change the system prompt
2 session.update_agent({
3     "type": "update_instructions",
4     "instructions": "You are now a billing agent.",
5 })
6 
7 # Inject context
8 session.update_agent({
9     "type": "inject_context",
10     "messages": [{"role": "user", "content": "Customer: Sarah, Gold tier."}],
11 })
12 
13 # Replace tools
14 session.update_agent({"type": "update_tools", "tools": [...]})
15 
16 # Trigger a new response
17 session.update_agent({
18     "type": "generate_reply",
19     "interrupt": True,
20     "user_input": "Tell the customer about the flash sale.",
21 })

LLM Updates

1 session.update_llm({"instructions": "Switch to Spanish."})

Method	Description
`send_tool_output(tool_call_id, data)`	Send tool/function result back to the LLM or agent verb
`update_agent(data)`	Send an `agent:update` command
`update_llm(data)`	Send an `llm:update` command

Inject Commands

Inject commands execute immediately on an active call without affecting the verb stack:

1 # Mute/unmute
2 session.inject_mute("mute")
3 session.inject_mute("unmute")
4 
5 # Whisper to one party
6 session.inject_whisper({"verb": "say", "text": "The customer is a VIP."}, agent_call_sid)
7 
8 # Control noise isolation
9 session.inject_noise_isolation("enable", vendor="krisp", level=80)
10 session.inject_noise_isolation("disable")
11 
12 # Control recording
13 session.inject_record("startCallRecording", siprec_server_url="sip:siprec@recorder.example.com")
14 session.inject_record("pauseCallRecording")
15 
16 # Send DTMF
17 session.inject_dtmf("1234")
18 
19 # Redirect call flow
20 session.inject_redirect("/new-webhook")
21 
22 # Tag the call with metadata
23 session.inject_tag({"priority": "high", "department": "billing"})

Session Properties

Property	Type	Description
`call_sid`	`str`	Unique call identifier
`from_number`	`str`	Caller phone number or SIP URI
`to`	`str`	Called phone number or SIP URI
`direction`	`str`	`'inbound'` or `'outbound'`
`account_sid`	`str`	Account identifier
`application_sid`	`str`	Application identifier
`call_id`	`str`	SIP Call-ID
`data`	`dict`	Full call session data (includes `env_vars`, SIP headers, etc.)

Examples

See the examples directory for runnable demos:

Example	Webhook	WebSocket	Description
hello-world	yes	yes	Minimal greeting
echo	yes	yes	Speech echo with gather
ivr-menu	yes	—	IVR menu with speech + DTMF
voice-agent	yes	yes	LLM pipeline with tool calls
dial	yes	—	Outbound dial with fallback
listen-record	yes	yes	Audio recording