Endpoints
Base URL for everything:
https://api.kral.ai/v1
Authentication is a Bearer token (Authorization: Bearer sk-kral-...) on every request. All endpoints follow the OpenAI wire format unless noted otherwise.
Core
| Endpoint | Method | Purpose |
|---|---|---|
/models |
GET | List the models your plan can use, with capabilities and parameters |
/chat/completions |
POST | Chat with any model, streaming or not |
/embeddings |
POST | Embedding vectors, billed on input tokens |
/images/generations |
POST | Image generation, billed per image |
/moderations |
POST | Content moderation, free of charge |
Audio
| Endpoint | Method | Purpose |
|---|---|---|
/audio/speech |
POST | Text-to-speech, billed per character |
/audio/transcriptions |
POST | Speech-to-text (multipart upload, up to 25 MB), billed per audio second |
/audio/translations |
POST | Speech-to-text with translation to English |
Native protocols
You are not locked to the OpenAI format. Two provider-native protocols are exposed directly, with the same key and the same billing:
- Anthropic Messages:
POST /messagesaccepts the native Anthropic request shape. Themodelfield decides routing. - Gemini:
POST https://api.kral.ai/v1beta/models/{model}:generateContent, plus:streamGenerateContentand:embedContent, accept Google's native shape.
Existing code written against the Anthropic or Google SDKs only needs the base URL and key swapped.
Assistants family (OpenAI models)
For OpenAI's stateful APIs, the gateway passes requests through with your account's gates applied: /assistants, /threads, /files, /vector_stores, /batches, and /responses. These reach OpenAI models only, since other providers have no equivalent API.
The models endpoint
GET /v1/models returns what your plan can access. Beyond the OpenAI-standard fields, each entry carries capabilities (for example image_generation) and, for media models, a media_params_schema describing the parameters that model accepts, so a client can render the right controls per model.
Request size
Chat and embedding requests accept large payloads (up to 50 MB) so document-heavy RAG contexts fit. Audio uploads cap at 25 MB.
Calling your agents
The endpoints above give you raw model access. To call an agent you configured in the app, complete with its instructions, knowledge, and tools, use the separate Agents API.