Back – Docs
OLLMDocs

Response Guide

Interactive examples for success and error responses. Click hotspots to learn the key fields. Mock-only.

Requests & Responses

A request is what your client sends to an API (method, endpoint, headers, and payload). A response is what comes back: either a success payload or a structured error you can handle deterministically.

In OLLM, you’re calling a Confidential AI Gateway over HTTP. From an integration perspective, this is still the standard request/response contract—but the payload represents a model invocation. A minimal chat-style request typically includes a model selector (for example near/gpt-oss-120b) and your prompt/messages. The user input you want the model to complete lives in messages[].content.

In production, the core engineering task is not “call the model”, but to build a reliable pipeline around it: attach authentication, set timeouts, handle rate limits, parse JSON safely, and surface actionable errors to users without leaking sensitive details.

This page is intentionally focused on the “response side” of that contract. We keep the request shape minimal and stable, then vary the outcome so you can learn how to interpret envelopes (success vs errors), find the model output, and guard your UI before rendering anything.

Concretely: on success, the model output is commonly found at choices[0].message.content. Token accounting is typically available under usage (e.g. usage.total_tokens) and is critical for cost, limits, and observability.

Key checks
  • HTTP status: 2xx success, 4xx client issues (often auth), 5xx server/proxy errors.
  • A dedicated error envelope on failure (don’t try to extract model output from an error payload).
  • Where the model output actually lives (often choices[0].message.content in chat responses).
  • Whether to retry: only retry on transient failures (timeouts/5xx) and avoid blind retries on auth/4xx.

For language-specific client samples (headers/auth, SDK setup, retries), see /ollm.

Request

Minimal JSON body shape (language-agnostic). Implementation details (auth/headers, timeouts, retries) depend on your stack.
Request
Click hotspots to learn fields like model + prompt.

Responses

Switch scenarios to learn how to handle success vs errors. Hotspots below point to the fields you should read.
Response interpretation

This is a successful completion response. Your primary job is to extract the assistant output safely and track usage for cost/limits.

What you should do

In most clients, you render the first choice. Keep your rendering logic strict: only render model output when the response is a 2xx success and there is no error envelope.

  • Read the assistant output from `choices[0].message.content`.
  • Record `usage` (tokens) to support rate-limit and cost tracking.
  • Handle multiple choices intentionally (default to the first unless you need n-best).
Common pitfalls

Most production bugs here come from selecting the wrong field or assuming shapes that can vary across providers/SDKs.

  • Using `choices[0].message` instead of `choices[0].message.content`.
  • Assuming `choices` is always non-empty without a guard.
  • Rendering content even when an error envelope exists.
Response
200 OK
Default render target: choices[0].message.content