kral Documentation
Sign in
API

Streaming

Set "stream": true and the response arrives as Server-Sent Events, token by token, instead of one blob at the end. The format is the OpenAI streaming format regardless of which provider serves the model.

Request

curl https://api.kral.ai/v1/chat/completions \
  -H "Authorization: Bearer $KRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "stream": true,
    "messages": [{"role": "user", "content": "Write a haiku about rivers"}]
  }'

The response is a stream of data: lines, each carrying a JSON chunk with a delta, terminated by data: [DONE].

With the SDKs

from openai import OpenAI

client = OpenAI(base_url="https://api.kral.ai/v1", api_key="sk-kral-...")

stream = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Write a haiku about rivers"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Write a haiku about rivers" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Usage in the stream

To get token counts on a streamed response, request them explicitly:

"stream_options": { "include_usage": true }

The final chunk then carries a usage object with input and output tokens. Billing is identical with or without streaming; the option only controls whether you see the numbers in-stream.

Native protocols

Streaming also works on the native endpoints: the Anthropic /messages endpoint streams Anthropic-style events, and Gemini streams via :streamGenerateContent. See Endpoints.

Practical notes

  • Keep the connection open until [DONE]; proxies with short idle timeouts are the most common cause of cut-off streams.
  • Tool calls stream too: delta.tool_calls arrives in fragments you accumulate by index.
  • If a stream errors mid-way, you are only billed for what was actually generated.