Vertex AI — Partner Models

Configure jambonz to use Llama, Mistral, and other partner models hosted on Vertex AI’s OpenAI-compatible endpoint.

Vertex AI also hosts third-party “partner” models — Meta’s Llama, Mistral, AI21 Jamba, and others — through an OpenAI-compatible chat completions endpoint. Use this vendor when you want non-Google models on GCP infrastructure with the same IAM and data-residency story as Vertex Gemini.

For Google’s own Gemini models, use Vertex AI — Gemini instead.

Get credentials

Identical to Vertex Gemini — you need a service account JSON. See the Vertex AI — Gemini page for full steps.

The service account needs the Vertex AI User role (roles/aiplatform.user).

Configure in jambonz

In the portal: Account → LLM Services → + Add LLM Service → Vertex AI — OpenAI-compatible.

Service Account Key (JSON)
fileRequired

Upload the same JSON you’d use for Vertex Gemini.

Project ID
stringRequired

Your GCP project id.

Region
selectRequired

Pick a region where the partner model you intend to use is hosted. Llama models are most broadly available in us-east5. Other partner models may have different region availability — check the Vertex partner-model docs before picking.

Use in an agent verb

1session.agent({
2 llm: {
3 vendor: 'vertex-openai',
4 model: 'meta/llama-3.3-70b-instruct-maas',
5 llmOptions: {
6 systemPrompt: 'You are a helpful voice assistant.',
7 maxTokens: 4096,
8 },
9 },
10 stt: { vendor: 'deepgram', language: 'en-US' },
11 tts: { vendor: 'cartesia', voice: 'sonic-english' },
12 turnDetection: 'krisp',
13 bargeIn: { enable: true },
14 actionHook: '/agent-complete',
15}).send();

Available Models

See Google’s Vertex AI partner models catalog for the full list and per-region availability. Common picks:

ModelNotes
meta/llama-3.3-70b-instruct-maasRecommended general-purpose Llama model
meta/llama-3.1-405b-instruct-maasLargest open-weight model on Vertex
mistral-largeMistral’s flagship
mistral-smallSmaller, cheaper

Quirks & errors

Llama MaaS returns empty responses if max_tokens isn’t set. A bare call without maxTokens can hit a Vertex quirk where the model returns finish_reason: stop with zero output tokens — your assistant says nothing. jambonz applies a defaultMaxTokens: 4096 workaround, so calls work even without an explicit setting. If you see empty responses, confirm you’re on a recent jambonz version.

Intermittent 400 INVALID_ARGUMENT — we’ve seen ~40% first-attempt 400-rate on meta/llama-3.3-70b-instruct-maas in some regions, with no body details. Identical retry succeeds. This is a Vertex-side flake, not a jambonz issue. The agent verb terminates with an LLM_FAILURE alert and runs the actionHook so your application can retry the turn. Reproducible with raw fetch outside jambonz, so the diagnosis isn’t ours; track Vertex AI release notes for fixes.

Region availability for Llama is limited compared to Gemini. If you get a 404 on a Llama model, switch to us-east5. Other regions sometimes catch up but us-east5 is the most reliable starting point.

Model listing isn’t supported on this endpoint — the LLM Services form will show only the curated list of knownModels jambonz ships. You can still type any partner model id manually in the agent verb’s model field; it’ll work as long as the model is hosted in your chosen region.