Node.js SDK

The @jambonz/sdk package is the recommended way to build jambonz voice applications in Node.js/TypeScript. It supports both webhook and WebSocket transports, a REST API client for mid-call control, and chainable verb methods for building call flows.

Source code: github.com/jambonz/node-sdk API reference: jambonz.github.io/node-sdk

This SDK replaces the older @jambonz/node-client (webhook) and @jambonz/node-client-ws (WebSocket) packages, which are now deprecated. The new SDK provides a unified package with a consistent API across both transports.

Which transport should I use? The WebSocket transport is recommended for most applications. It provides a persistent bidirectional connection that enables TTS streaming, mid-call updates, inject commands, real-time event handling, and voice AI features like the agent and llm verbs. The webhook transport is simpler but limited — use it for straightforward call routing scenarios (e.g., dial, basic IVR menus) where you don’t need real-time interaction.

Installation

$ npm install @jambonz/sdk

Imports

The SDK provides three subpath exports:

1 // Webhook apps (Express/HTTP)
2 import { WebhookResponse } from '@jambonz/sdk/webhook';
3 
4 // WebSocket apps
5 import { createEndpoint } from '@jambonz/sdk/websocket';
6 
7 // REST API client (mid-call control, outbound calls)
8 import { JambonzClient } from '@jambonz/sdk/client';

Webhook Transport

Use WebhookResponse to build verb arrays in response to HTTP webhooks. Methods are chainable and the response is serialized to JSON.

1 import express from 'express';
2 import { WebhookResponse } from '@jambonz/sdk/webhook';
3 
4 const app = express();
5 app.use(express.json());
6 
7 app.post('/incoming', (req, res) => {
8   const jambonz = new WebhookResponse();
9   jambonz
10     .say({ text: 'Hello from jambonz!' })
11     .gather({
12       input: ['speech', 'digits'],
13       actionHook: '/handle-input',
14       say: { text: 'Press 1 for sales or 2 for support.' },
15     })
16     .hangup();
17 
18   res.json(jambonz);
19 });
20 
21 app.post('/handle-input', (req, res) => {
22   const jambonz = new WebhookResponse();
23   const speech = req.body.speech?.alternatives?.[0]?.transcript;
24   jambonz.say({ text: `You said: ${speech}` }).hangup();
25   res.json(jambonz);
26 });
27 
28 app.listen(3000);

WebSocket Transport

Use createEndpoint to build real-time WebSocket applications. This is the recommended transport for voice AI agents, as it enables bidirectional communication, event streaming, and mid-call updates.

1 import http from 'http';
2 import { createEndpoint } from '@jambonz/sdk/websocket';
3 
4 const server = http.createServer();
5 const makeService = createEndpoint({ server, port: 3000 });
6 
7 const svc = makeService({ path: '/' });
8 
9 svc.on('session:new', (session) => {
10   // Bind actionHook handlers first
11   session.on('/gather-result', (evt) => {
12     const transcript = evt.speech?.alternatives?.[0]?.transcript || '';
13     session.say({ text: `You said: ${transcript}` }).hangup().reply();
14   });
15 
16   // Send initial verbs
17   session
18     .say({ text: 'Hello! Say something.' })
19     .gather({ input: ['speech'], actionHook: '/gather-result', timeout: 10 })
20     .hangup()
21     .send();
22 });

.send() vs .reply()

.send() — Use once for the initial verb array in response to session:new.
.reply() — Use for all subsequent responses to actionHook events.

This distinction is important: .send() starts the call flow, while .reply() continues it in response to events.

Application Environment Variables

You can declare environment variables that are configurable in the jambonz portal UI:

1 const makeService = createEndpoint({
2   server,
3   port: 3000,
4   envVars: {
5     OPENAI_MODEL: {
6       type: 'string',
7       description: 'LLM model to use',
8       default: 'gpt-4.1-mini',
9     },
10     SYSTEM_PROMPT: {
11       type: 'string',
12       description: 'System prompt',
13       uiHint: 'textarea',
14       default: 'You are a helpful assistant.',
15     },
16   },
17 });
18 
19 // Read values in session handler
20 svc.on('session:new', (session) => {
21   const model = session.data.env_vars?.OPENAI_MODEL || 'gpt-4.1-mini';
22   // ...
23 });

Audio Streams

When using the listen verb, makeService.audio() lets you handle both call control and audio on the same server:

1 const svc = makeService({ path: '/' });
2 const audioSvc = makeService.audio({ path: '/audio-stream' });
3 
4 svc.on('session:new', (session) => {
5   session
6     .say({ text: 'Listening...' })
7     .listen({
8       url: '/audio-stream',
9       sampleRate: 8000,
10       bidirectionalAudio: { enabled: true, streaming: true, sampleRate: 8000 },
11     })
12     .send();
13 });
14 
15 audioSvc.on('connection', (stream) => {
16   stream.on('audio', (pcm) => {
17     // Process audio — feed to STT, record, etc.
18   });
19 
20   // Send audio back
21   stream.sendAudio(pcmBuffer);
22 
23   stream.on('close', () => console.log('Audio stream closed'));
24 });

The AudioStream object provides sendAudio(), playAudio(), killAudio(), disconnect(), sendMark(), and clearMarks() methods.

REST API Client

Use JambonzClient for outbound calls and mid-call control:

1 import { JambonzClient } from '@jambonz/sdk/client';
2 
3 const client = new JambonzClient({
4   baseUrl: 'https://api.jambonz.us',
5   accountSid: 'your-account-sid',
6   apiKey: 'your-api-key',
7 });
8 
9 // Create an outbound call
10 await client.calls.create({
11   from: '+15085551212',
12   to: { type: 'phone', number: '+15085551213' },
13   call_hook: '/incoming',
14 });
15 
16 // Mid-call control
17 await client.calls.mute(callSid, 'mute');
18 await client.calls.redirect(callSid, 'https://example.com/new-flow');

Verb Methods

Both WebhookResponse and WebSocket Session support the same chainable verb methods:

.say() .play() .gather() .dial() .llm() .agent() .conference() .enqueue() .dequeue() .hangup() .pause() .redirect() .config() .tag() .dtmf() .listen() .transcribe() .message() .dub() .alert() .answer() .leave() .sipDecline() .sipRefer() .sipRequest()

All methods accept the same options as the corresponding verb JSON schemas and are chainable.

TTS Token Streaming

The WebSocket Session provides methods for incremental TTS token streaming, enabling the lowest-latency voice AI experiences. This is used when you’re streaming tokens from an LLM and want them spoken as they arrive.

1 session.on('/llm-tokens', async (evt) => {
2   const { tokens, done } = evt;
3 
4   if (tokens) {
5     // Send tokens as they arrive from the LLM — backpressure is handled automatically
6     await session.sendTtsTokens(tokens);
7   }
8 
9   if (done) {
10     // Signal end of token stream
11     session.flushTtsTokens();
12   }
13 });

Method	Returns	Description
`sendTtsTokens(text)`	`Promise<void>`	Send a chunk of text for TTS. Resolves when jambonz acknowledges receipt. Automatically applies backpressure if the buffer is full.
`flushTtsTokens()`	`void`	Signal the end of a TTS token stream. Triggers final audio generation.
`clearTtsTokens()`	`void`	Cancel all pending TTS tokens, clear the queue, and reset backpressure state.

The isTtsPaused property indicates whether TTS streaming is paused due to backpressure.

TTS Streaming Events

Event	Description
`tts:stream_open`	TTS vendor connection established
`tts:stream_paused`	Backpressure — buffer full, tokens will queue
`tts:stream_resumed`	Backpressure released, streaming resumes
`tts:stream_closed`	TTS stream ended
`tts:user_interruption`	User barged in during TTS playback

LLM and Agent Updates

The Session provides methods for interacting with active LLM and agent conversations.

Tool Output

When the LLM requests a tool/function call (via the toolHook), respond with the result:

1 session.on('/tool-call', async (evt) => {
2   const { tool_call_id, name, arguments: args } = evt;
3   const result = await handleTool(name, args);
4   session.sendToolOutput(tool_call_id, result);
5 });

Agent Updates

Send mid-conversation updates to an active agent:

1 // Change the system prompt
2 session.updateAgent({
3   type: 'update_instructions',
4   instructions: 'You are now a billing agent.',
5 });
6 
7 // Inject context
8 session.updateAgent({
9   type: 'inject_context',
10   messages: [{ role: 'user', content: 'Customer: Sarah, Gold tier.' }],
11 });
12 
13 // Replace tools
14 session.updateAgent({ type: 'update_tools', tools: [...] });
15 
16 // Trigger a new response (with optional interrupt)
17 session.updateAgent({
18   type: 'generate_reply',
19   interrupt: true,
20   user_input: 'Tell the customer about the flash sale.',
21 });

LLM Updates

Send updates to an active llm verb:

1 session.updateLlm({ instructions: 'Switch to Spanish.' });

Method	Description
`sendToolOutput(toolCallId, data)`	Send tool/function result back to the LLM or agent verb
`updateAgent(data)`	Send an `agent:update` command (update_instructions, inject_context, update_tools, generate_reply)
`updateLlm(data)`	Send an `llm:update` command

Inject Commands

Inject commands execute immediately on an active call without affecting the verb stack. They are useful for mid-call control actions like muting, recording, or whispering to one party on a bridged call.

1 // Mute/unmute
2 session.injectMute('mute');
3 session.injectMute('unmute');
4 
5 // Whisper to one party (e.g., coaching a call center agent)
6 session.injectWhisper({ verb: 'say', text: 'The customer is a VIP.' }, agentCallSid);
7 
8 // Control noise isolation mid-call
9 session.injectNoiseIsolation('enable', { vendor: 'krisp', level: 80 });
10 session.injectNoiseIsolation('disable');
11 
12 // Control recording
13 session.injectRecord('startCallRecording', { siprecServerURL: 'sip:siprec@recorder.example.com' });
14 session.injectRecord('pauseCallRecording');
15 
16 // Pause/resume audio streaming
17 session.injectListenStatus('pause');
18 session.injectListenStatus('resume');
19 
20 // Send DTMF
21 session.injectDtmf('1234');
22 
23 // Redirect call flow
24 session.injectRedirect('/new-webhook');
25 
26 // Tag the call with metadata
27 session.injectTag({ priority: 'high', department: 'billing' });

Method	Description
`injectMute(status)`	Mute or unmute the call (`'mute'` or `'unmute'`)
`injectWhisper(verb, callSid?)`	Play a whisper verb (say/play) to one party on a bridged call
`injectNoiseIsolation(status, opts?, callSid?)`	Enable or disable noise isolation. Options: `vendor`, `level`, `model`
`injectRecord(action, opts?, callSid?)`	Control call recording: `startCallRecording`, `stopCallRecording`, `pauseCallRecording`, `resumeCallRecording`
`injectListenStatus(status, callSid?)`	Pause or resume audio streaming (`'pause'` or `'resume'`)
`injectDtmf(digit, duration?, callSid?)`	Send DTMF digits into the call
`injectRedirect(hook, callSid?)`	Redirect call execution to a new webhook
`injectTag(data, callSid?)`	Attach metadata to the call
`injectCommand(command, data?, callSid?)`	Send a generic inject command

The optional callSid parameter on inject methods targets a specific call leg on a bridged call. Omit it to target the current call.

Session Properties

Property	Type	Description
`callSid`	`string`	Unique call identifier
`from`	`string`	Caller phone number or SIP URI
`to`	`string`	Called phone number or SIP URI
`direction`	`'inbound' \| 'outbound'`	Call direction
`accountSid`	`string`	Account identifier
`applicationSid`	`string`	Application identifier
`callId`	`string`	SIP Call-ID
`data`	`CallSession`	Full call session data (includes `env_vars`, SIP headers, etc.)
`locals`	`Record<string, unknown>`	Application-specific storage that persists for the session
`isTtsPaused`	`boolean`	Whether TTS streaming is paused due to backpressure

Session Events

Event	Description
`'/hookName'`	ActionHook callback — requires `.reply()`
`verb:status`	Verb status change (when `notifyEvents` is enabled)
`call:status`	Call state change
`jambonz:error`	Error from jambonz
`close`	WebSocket connection closed
`error`	WebSocket connection error

AI-Assisted Development

The @jambonz/mcp-schema-server package is an MCP server that gives AI coding assistants deep knowledge of jambonz APIs, verb schemas, and SDK patterns. Set it up so your AI can generate correct jambonz code automatically.

Remote server (simplest):

$ claude mcp add jambonz -t http https://mcp-server.jambonz.app/mcp

Local via npx:

$ claude mcp add jambonz -- npx -y @jambonz/mcp-schema-server

For Cursor, VS Code, and other editors, see the setup instructions in the repository.

A complementary Agent Skill provides procedural knowledge about jambonz patterns and best practices:

$ npx skills add jambonz/skills

Examples

See the examples directory for runnable demos:

Example	Transport	Description
hello-world	Webhook + WS	Minimal greeting
echo	Webhook + WS	Speech echo using gather
ivr-menu	Webhook	Interactive menu with speech and DTMF
voice-agent	Webhook + WS	LLM-powered conversational AI with tools
openai-realtime	WebSocket	OpenAI Realtime API voice agent
llm-streaming	WebSocket	LLM with TTS streaming and barge-in

For agent verb examples, see the agent examples.