API
Streaming
Set "stream": true and the response arrives as Server-Sent Events, token by token, instead of one blob at the end. The format is the OpenAI streaming format regardless of which provider serves the model.
Request
curl https://api.kral.ai/v1/chat/completions \
-H "Authorization: Bearer $KRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"stream": true,
"messages": [{"role": "user", "content": "Write a haiku about rivers"}]
}'
The response is a stream of data: lines, each carrying a JSON chunk with a delta, terminated by data: [DONE].
With the SDKs
from openai import OpenAI
client = OpenAI(base_url="https://api.kral.ai/v1", api_key="sk-kral-...")
stream = client.chat.completions.create(
model="gpt-5",
messages=[{"role": "user", "content": "Write a haiku about rivers"}],
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
const stream = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: "Write a haiku about rivers" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
Usage in the stream
To get token counts on a streamed response, request them explicitly:
"stream_options": { "include_usage": true }
The final chunk then carries a usage object with input and output tokens. Billing is identical with or without streaming; the option only controls whether you see the numbers in-stream.
Native protocols
Streaming also works on the native endpoints: the Anthropic /messages endpoint streams Anthropic-style events, and Gemini streams via :streamGenerateContent. See Endpoints.
Practical notes
- Keep the connection open until
[DONE]; proxies with short idle timeouts are the most common cause of cut-off streams. - Tool calls stream too:
delta.tool_callsarrives in fragments you accumulate by index. - If a stream errors mid-way, you are only billed for what was actually generated.