Agents API
Every agent you build in the app is also callable programmatically. Configure a support agent once, with instructions, a model, tools, and your help-center documents as knowledge, then drive it from your website, backend, or workflow tool like any OpenAI-style model.
The point: all the agent's setup lives server-side. Your code sends user messages and gets answers; instructions, knowledge search, and tools run on our side, and updating the agent in the app changes the behavior everywhere it is used, without touching your code.
Base URL and authentication
https://app.kral.ai/api/agents/v1
The Agents API uses its own keys, created in the app under settings, not the dashboard API keys from the API quickstart. Agent keys start with sk-, are shown once at creation, and can carry an expiry date. Authentication is the usual Bearer header.
Your agents are the models
GET /models
lists your agents in the OpenAI models format; the id is what you pass as model. GET /models/{id} returns details for one agent.
Chat completions
The familiar format, with an agent id as the model:
curl https://app.kral.ai/api/agents/v1/chat/completions \
-H "Authorization: Bearer $AGENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "agent_abc123",
"messages": [{"role": "user", "content": "My invoice is missing a VAT ID, what do I do?"}],
"stream": true
}'
Works with the OpenAI SDKs unchanged: set base_url to the URL above and pass the agent id as model. Streaming arrives as standard SSE chunks; agents that reason expose their thinking as delta.reasoning in the stream.
Conversations can be threaded: pass conversation_id (and optionally parent_message_id) from a previous response to continue with full context, so a support session stays one conversation across many requests.
Responses API
For new integrations there is also an endpoint following the Open Responses specification:
POST /responses
It takes input (a string or an array of items), optional instructions, tools, tool_choice, temperature, and max_output_tokens, and supports continuation via previous_response_id. Semantic streaming events make it the more natural fit for agentic UIs that want to render tool steps as they happen.
Billing and limits
Agent runs are billed against your account credit like chat usage, at the underlying model's token prices, plus whatever tools the agent uses. Costs appear in the usage log like every other request.
When to use which API
- Main API (
api.kral.ai): raw model access, you bring the prompts. See Endpoints. - Agents API (this page): the prompt engineering lives in the agent; your code just relays user messages. The right choice for support bots, internal assistants, and anything a non-developer should be able to retune in the app.