Vertex AI — Partner Models

Vertex AI also hosts third-party “partner” models — Meta’s Llama, Mistral, AI21 Jamba, and others — through an OpenAI-compatible chat completions endpoint. Use this vendor when you want non-Google models on GCP infrastructure with the same IAM and data-residency story as Vertex Gemini.

For Google’s own Gemini models, use Vertex AI — Gemini instead.

Get credentials

Identical to Vertex Gemini — you need a service account JSON. See the Vertex AI — Gemini page for full steps.

The service account needs the Vertex AI User role (roles/aiplatform.user).

Configure in jambonz

In the portal: Account → LLM Services → + Add LLM Service → Vertex AI — OpenAI-compatible.

Service Account Key (JSON)

fileRequired

Upload the same JSON you’d use for Vertex Gemini.

Project ID

stringRequired

Your GCP project id.

Region

selectRequired

Pick a region where the partner model you intend to use is hosted. Llama models are most broadly available in us-east5. Other partner models may have different region availability — check the Vertex partner-model docs before picking.

Use in an agent verb

1 session.agent({
2   llm: {
3     vendor: 'vertex-openai',
4     model: 'meta/llama-3.3-70b-instruct-maas',
5     llmOptions: {
6       systemPrompt: 'You are a helpful voice assistant.',
7       maxTokens: 4096,
8     },
9   },
10   stt: { vendor: 'deepgram', language: 'en-US' },
11   tts: { vendor: 'cartesia', voice: 'sonic-english' },
12   turnDetection: 'krisp',
13   bargeIn: { enable: true },
14   actionHook: '/agent-complete',
15 }).send();

Available Models

See Google’s Vertex AI partner models catalog for the full list and per-region availability. Common picks:

Model	Notes
`meta/llama-3.3-70b-instruct-maas`	Recommended general-purpose Llama model
`meta/llama-3.1-405b-instruct-maas`	Largest open-weight model on Vertex
`mistral-large`	Mistral’s flagship
`mistral-small`	Smaller, cheaper

Quirks & errors

Llama MaaS returns empty responses if max_tokens isn’t set. A bare call without maxTokens can hit a Vertex quirk where the model returns finish_reason: stop with zero output tokens — your assistant says nothing. jambonz applies a defaultMaxTokens: 4096 workaround, so calls work even without an explicit setting. If you see empty responses, confirm you’re on a recent jambonz version.

Intermittent 400 INVALID_ARGUMENT — we’ve seen ~40% first-attempt 400-rate on meta/llama-3.3-70b-instruct-maas in some regions, with no body details. Identical retry succeeds. This is a Vertex-side flake, not a jambonz issue. The agent verb terminates with an LLM_FAILURE alert and runs the actionHook so your application can retry the turn. Reproducible with raw fetch outside jambonz, so the diagnosis isn’t ours; track Vertex AI release notes for fixes.

Region availability for Llama is limited compared to Gemini. If you get a 404 on a Llama model, switch to us-east5. Other regions sometimes catch up but us-east5 is the most reliable starting point.

Model listing isn’t supported on this endpoint — the LLM Services form will show only the curated list of knownModels jambonz ships. You can still type any partner model id manually in the agent verb’s model field; it’ll work as long as the model is hosted in your chosen region.