kral Documentation
Sign in
API

Endpoints

Base URL for everything:

https://api.kral.ai/v1

Authentication is a Bearer token (Authorization: Bearer sk-kral-...) on every request. All endpoints follow the OpenAI wire format unless noted otherwise.

Core

Endpoint Method Purpose
/models GET List the models your plan can use, with capabilities and parameters
/chat/completions POST Chat with any model, streaming or not
/embeddings POST Embedding vectors, billed on input tokens
/images/generations POST Image generation, billed per image
/moderations POST Content moderation, free of charge

Audio

Endpoint Method Purpose
/audio/speech POST Text-to-speech, billed per character
/audio/transcriptions POST Speech-to-text (multipart upload, up to 25 MB), billed per audio second
/audio/translations POST Speech-to-text with translation to English

Native protocols

You are not locked to the OpenAI format. Two provider-native protocols are exposed directly, with the same key and the same billing:

  • Anthropic Messages: POST /messages accepts the native Anthropic request shape. The model field decides routing.
  • Gemini: POST https://api.kral.ai/v1beta/models/{model}:generateContent, plus :streamGenerateContent and :embedContent, accept Google's native shape.

Existing code written against the Anthropic or Google SDKs only needs the base URL and key swapped.

Assistants family (OpenAI models)

For OpenAI's stateful APIs, the gateway passes requests through with your account's gates applied: /assistants, /threads, /files, /vector_stores, /batches, and /responses. These reach OpenAI models only, since other providers have no equivalent API.

The models endpoint

GET /v1/models returns what your plan can access. Beyond the OpenAI-standard fields, each entry carries capabilities (for example image_generation) and, for media models, a media_params_schema describing the parameters that model accepts, so a client can render the right controls per model.

Request size

Chat and embedding requests accept large payloads (up to 50 MB) so document-heavy RAG contexts fit. Audio uploads cap at 25 MB.

Calling your agents

The endpoints above give you raw model access. To call an agent you configured in the app, complete with its instructions, knowledge, and tools, use the separate Agents API.