Node.js SDK

Build jambonz voice applications with @jambonz/sdk

The @jambonz/sdk package is the recommended way to build jambonz voice applications in Node.js/TypeScript. It supports both webhook and WebSocket transports, a REST API client for mid-call control, and chainable verb methods for building call flows.

Source code: github.com/jambonz/node-sdk API reference: jambonz.github.io/node-sdk

This SDK replaces the older @jambonz/node-client (webhook) and @jambonz/node-client-ws (WebSocket) packages, which are now deprecated. The new SDK provides a unified package with a consistent API across both transports.

Which transport should I use? The WebSocket transport is recommended for most applications. It provides a persistent bidirectional connection that enables TTS streaming, mid-call updates, inject commands, real-time event handling, and voice AI features like the agent and llm verbs. The webhook transport is simpler but limited — use it for straightforward call routing scenarios (e.g., dial, basic IVR menus) where you don’t need real-time interaction.

Installation

$npm install @jambonz/sdk

Imports

The SDK provides three subpath exports:

1// Webhook apps (Express/HTTP)
2import { WebhookResponse } from '@jambonz/sdk/webhook';
3
4// WebSocket apps
5import { createEndpoint } from '@jambonz/sdk/websocket';
6
7// REST API client (mid-call control, outbound calls)
8import { JambonzClient } from '@jambonz/sdk/client';

Webhook Transport

Use WebhookResponse to build verb arrays in response to HTTP webhooks. Methods are chainable and the response is serialized to JSON.

1import express from 'express';
2import { WebhookResponse } from '@jambonz/sdk/webhook';
3
4const app = express();
5app.use(express.json());
6
7app.post('/incoming', (req, res) => {
8 const jambonz = new WebhookResponse();
9 jambonz
10 .say({ text: 'Hello from jambonz!' })
11 .gather({
12 input: ['speech', 'digits'],
13 actionHook: '/handle-input',
14 say: { text: 'Press 1 for sales or 2 for support.' },
15 })
16 .hangup();
17
18 res.json(jambonz);
19});
20
21app.post('/handle-input', (req, res) => {
22 const jambonz = new WebhookResponse();
23 const speech = req.body.speech?.alternatives?.[0]?.transcript;
24 jambonz.say({ text: `You said: ${speech}` }).hangup();
25 res.json(jambonz);
26});
27
28app.listen(3000);

WebSocket Transport

Use createEndpoint to build real-time WebSocket applications. This is the recommended transport for voice AI agents, as it enables bidirectional communication, event streaming, and mid-call updates.

1import http from 'http';
2import { createEndpoint } from '@jambonz/sdk/websocket';
3
4const server = http.createServer();
5const makeService = createEndpoint({ server, port: 3000 });
6
7const svc = makeService({ path: '/' });
8
9svc.on('session:new', (session) => {
10 // Bind actionHook handlers first
11 session.on('/gather-result', (evt) => {
12 const transcript = evt.speech?.alternatives?.[0]?.transcript || '';
13 session.say({ text: `You said: ${transcript}` }).hangup().reply();
14 });
15
16 // Send initial verbs
17 session
18 .say({ text: 'Hello! Say something.' })
19 .gather({ input: ['speech'], actionHook: '/gather-result', timeout: 10 })
20 .hangup()
21 .send();
22});

.send() vs .reply()

  • .send() — Use once for the initial verb array in response to session:new.
  • .reply() — Use for all subsequent responses to actionHook events.

This distinction is important: .send() starts the call flow, while .reply() continues it in response to events.

Application Environment Variables

You can declare environment variables that are configurable in the jambonz portal UI:

1const makeService = createEndpoint({
2 server,
3 port: 3000,
4 envVars: {
5 OPENAI_MODEL: {
6 type: 'string',
7 description: 'LLM model to use',
8 default: 'gpt-4.1-mini',
9 },
10 SYSTEM_PROMPT: {
11 type: 'string',
12 description: 'System prompt',
13 uiHint: 'textarea',
14 default: 'You are a helpful assistant.',
15 },
16 },
17});
18
19// Read values in session handler
20svc.on('session:new', (session) => {
21 const model = session.data.env_vars?.OPENAI_MODEL || 'gpt-4.1-mini';
22 // ...
23});

Audio Streams

When using the listen verb, makeService.audio() lets you handle both call control and audio on the same server:

1const svc = makeService({ path: '/' });
2const audioSvc = makeService.audio({ path: '/audio-stream' });
3
4svc.on('session:new', (session) => {
5 session
6 .say({ text: 'Listening...' })
7 .listen({
8 url: '/audio-stream',
9 sampleRate: 8000,
10 bidirectionalAudio: { enabled: true, streaming: true, sampleRate: 8000 },
11 })
12 .send();
13});
14
15audioSvc.on('connection', (stream) => {
16 stream.on('audio', (pcm) => {
17 // Process audio — feed to STT, record, etc.
18 });
19
20 // Send audio back
21 stream.sendAudio(pcmBuffer);
22
23 stream.on('close', () => console.log('Audio stream closed'));
24});

The AudioStream object provides sendAudio(), playAudio(), killAudio(), disconnect(), sendMark(), and clearMarks() methods.

REST API Client

Use JambonzClient for outbound calls and mid-call control:

1import { JambonzClient } from '@jambonz/sdk/client';
2
3const client = new JambonzClient({
4 baseUrl: 'https://api.jambonz.us',
5 accountSid: 'your-account-sid',
6 apiKey: 'your-api-key',
7});
8
9// Create an outbound call
10await client.calls.create({
11 from: '+15085551212',
12 to: { type: 'phone', number: '+15085551213' },
13 call_hook: '/incoming',
14});
15
16// Mid-call control
17await client.calls.mute(callSid, 'mute');
18await client.calls.redirect(callSid, 'https://example.com/new-flow');

Verb Methods

Both WebhookResponse and WebSocket Session support the same chainable verb methods:

.say() .play() .gather() .dial() .llm() .agent() .conference() .enqueue() .dequeue() .hangup() .pause() .redirect() .config() .tag() .dtmf() .listen() .transcribe() .message() .dub() .alert() .answer() .leave() .sipDecline() .sipRefer() .sipRequest()

All methods accept the same options as the corresponding verb JSON schemas and are chainable.

TTS Token Streaming

The WebSocket Session provides methods for incremental TTS token streaming, enabling the lowest-latency voice AI experiences. This is used when you’re streaming tokens from an LLM and want them spoken as they arrive.

1session.on('/llm-tokens', async (evt) => {
2 const { tokens, done } = evt;
3
4 if (tokens) {
5 // Send tokens as they arrive from the LLM — backpressure is handled automatically
6 await session.sendTtsTokens(tokens);
7 }
8
9 if (done) {
10 // Signal end of token stream
11 session.flushTtsTokens();
12 }
13});
MethodReturnsDescription
sendTtsTokens(text)Promise<void>Send a chunk of text for TTS. Resolves when jambonz acknowledges receipt. Automatically applies backpressure if the buffer is full.
flushTtsTokens()voidSignal the end of a TTS token stream. Triggers final audio generation.
clearTtsTokens()voidCancel all pending TTS tokens, clear the queue, and reset backpressure state.

The isTtsPaused property indicates whether TTS streaming is paused due to backpressure.

TTS Streaming Events

EventDescription
tts:stream_openTTS vendor connection established
tts:stream_pausedBackpressure — buffer full, tokens will queue
tts:stream_resumedBackpressure released, streaming resumes
tts:stream_closedTTS stream ended
tts:user_interruptionUser barged in during TTS playback

LLM and Agent Updates

The Session provides methods for interacting with active LLM and agent conversations.

Tool Output

When the LLM requests a tool/function call (via the toolHook), respond with the result:

1session.on('/tool-call', async (evt) => {
2 const { tool_call_id, name, arguments: args } = evt;
3 const result = await handleTool(name, args);
4 session.sendToolOutput(tool_call_id, result);
5});

Agent Updates

Send mid-conversation updates to an active agent:

1// Change the system prompt
2session.updateAgent({
3 type: 'update_instructions',
4 instructions: 'You are now a billing agent.',
5});
6
7// Inject context
8session.updateAgent({
9 type: 'inject_context',
10 messages: [{ role: 'user', content: 'Customer: Sarah, Gold tier.' }],
11});
12
13// Replace tools
14session.updateAgent({ type: 'update_tools', tools: [...] });
15
16// Trigger a new response (with optional interrupt)
17session.updateAgent({
18 type: 'generate_reply',
19 interrupt: true,
20 user_input: 'Tell the customer about the flash sale.',
21});

LLM Updates

Send updates to an active llm verb:

1session.updateLlm({ instructions: 'Switch to Spanish.' });
MethodDescription
sendToolOutput(toolCallId, data)Send tool/function result back to the LLM or agent verb
updateAgent(data)Send an agent:update command (update_instructions, inject_context, update_tools, generate_reply)
updateLlm(data)Send an llm:update command

Inject Commands

Inject commands execute immediately on an active call without affecting the verb stack. They are useful for mid-call control actions like muting, recording, or whispering to one party on a bridged call.

1// Mute/unmute
2session.injectMute('mute');
3session.injectMute('unmute');
4
5// Whisper to one party (e.g., coaching a call center agent)
6session.injectWhisper({ verb: 'say', text: 'The customer is a VIP.' }, agentCallSid);
7
8// Control noise isolation mid-call
9session.injectNoiseIsolation('enable', { vendor: 'krisp', level: 80 });
10session.injectNoiseIsolation('disable');
11
12// Control recording
13session.injectRecord('startCallRecording', { siprecServerURL: 'sip:siprec@recorder.example.com' });
14session.injectRecord('pauseCallRecording');
15
16// Pause/resume audio streaming
17session.injectListenStatus('pause');
18session.injectListenStatus('resume');
19
20// Send DTMF
21session.injectDtmf('1234');
22
23// Redirect call flow
24session.injectRedirect('/new-webhook');
25
26// Tag the call with metadata
27session.injectTag({ priority: 'high', department: 'billing' });
MethodDescription
injectMute(status)Mute or unmute the call ('mute' or 'unmute')
injectWhisper(verb, callSid?)Play a whisper verb (say/play) to one party on a bridged call
injectNoiseIsolation(status, opts?, callSid?)Enable or disable noise isolation. Options: vendor, level, model
injectRecord(action, opts?, callSid?)Control call recording: startCallRecording, stopCallRecording, pauseCallRecording, resumeCallRecording
injectListenStatus(status, callSid?)Pause or resume audio streaming ('pause' or 'resume')
injectDtmf(digit, duration?, callSid?)Send DTMF digits into the call
injectRedirect(hook, callSid?)Redirect call execution to a new webhook
injectTag(data, callSid?)Attach metadata to the call
injectCommand(command, data?, callSid?)Send a generic inject command

The optional callSid parameter on inject methods targets a specific call leg on a bridged call. Omit it to target the current call.

Session Properties

PropertyTypeDescription
callSidstringUnique call identifier
fromstringCaller phone number or SIP URI
tostringCalled phone number or SIP URI
direction'inbound' | 'outbound'Call direction
accountSidstringAccount identifier
applicationSidstringApplication identifier
callIdstringSIP Call-ID
dataCallSessionFull call session data (includes env_vars, SIP headers, etc.)
localsRecord<string, unknown>Application-specific storage that persists for the session
isTtsPausedbooleanWhether TTS streaming is paused due to backpressure

Session Events

EventDescription
'/hookName'ActionHook callback — requires .reply()
verb:statusVerb status change (when notifyEvents is enabled)
call:statusCall state change
jambonz:errorError from jambonz
closeWebSocket connection closed
errorWebSocket connection error

AI-Assisted Development

The @jambonz/mcp-schema-server package is an MCP server that gives AI coding assistants deep knowledge of jambonz APIs, verb schemas, and SDK patterns. Set it up so your AI can generate correct jambonz code automatically.

Remote server (simplest):

$claude mcp add jambonz -t http https://mcp-server.jambonz.app/mcp

Local via npx:

$claude mcp add jambonz -- npx -y @jambonz/mcp-schema-server

For Cursor, VS Code, and other editors, see the setup instructions in the repository.

A complementary Agent Skill provides procedural knowledge about jambonz patterns and best practices:

$npx skills add jambonz/skills

Examples

See the examples directory for runnable demos:

ExampleTransportDescription
hello-worldWebhook + WSMinimal greeting
echoWebhook + WSSpeech echo using gather
ivr-menuWebhookInteractive menu with speech and DTMF
voice-agentWebhook + WSLLM-powered conversational AI with tools
openai-realtimeWebSocketOpenAI Realtime API voice agent
llm-streamingWebSocketLLM with TTS streaming and barge-in

For agent verb examples, see the agent examples.