API

Endpoints

Base URL for everything:

https://api.kral.ai/v1

Authentication is a Bearer token (Authorization: Bearer sk-kral-...) on every request. All endpoints follow the OpenAI wire format unless noted otherwise.

Core

Endpoint	Method	Purpose
`/models`	GET	List the models your plan can use, with capabilities and parameters
`/chat/completions`	POST	Chat with any model, streaming or not
`/embeddings`	POST	Embedding vectors, billed on input tokens
`/images/generations`	POST	Image generation, billed per image
`/moderations`	POST	Content moderation, free of charge

Audio

Endpoint	Method	Purpose
`/audio/speech`	POST	Text-to-speech, billed per character
`/audio/transcriptions`	POST	Speech-to-text (multipart upload, up to 25 MB), billed per audio second
`/audio/translations`	POST	Speech-to-text with translation to English

Native protocols

You are not locked to the OpenAI format. Two provider-native protocols are exposed directly, with the same key and the same billing:

Anthropic Messages: POST /messages accepts the native Anthropic request shape. The model field decides routing.
Gemini: POST https://api.kral.ai/v1beta/models/{model}:generateContent, plus :streamGenerateContent and :embedContent, accept Google's native shape.

Existing code written against the Anthropic or Google SDKs only needs the base URL and key swapped.

Assistants family (OpenAI models)

For OpenAI's stateful APIs, the gateway passes requests through with your account's gates applied: /assistants, /threads, /files, /vector_stores, /batches, and /responses. These reach OpenAI models only, since other providers have no equivalent API.

The models endpoint

GET /v1/models returns what your plan can access. Beyond the OpenAI-standard fields, each entry carries capabilities (for example image_generation) and, for media models, a media_params_schema describing the parameters that model accepts, so a client can render the right controls per model.

Request size

Chat and embedding requests accept large payloads (up to 50 MB) so document-heavy RAG contexts fit. Audio uploads cap at 25 MB.

Calling your agents

The endpoints above give you raw model access. To call an agent you configured in the app, complete with its instructions, knowledge, and tools, use the separate Agents API.