API Reference
The full interactive API reference is available at:
api.overshoot.ai/docs (opens in a new tab)
Authentication
All API requests require an API key passed as a Bearer token:
Authorization: Bearer your-api-keyGet your API key at platform.overshoot.ai/api-keys (opens in a new tab).
Base URL
https://api.overshoot.ai/v0.2Endpoints
Create Stream
POST /streamsCreates a new stream and returns connection details.
Request body:
{
"mode": "clip",
"processing": {
"target_fps": 6,
"clip_length_seconds": 0.5,
"delay_seconds": 0.5
},
"inference": {
"prompt": "Read any visible text",
"model": "Qwen/Qwen3.5-9B",
"output_schema_json": null,
"max_output_tokens": null
}
}Source types (optional):
- Omit field or
{ type: "native" }— Native LiveKit transport (default, recommended) { type: "webrtc", sdp: "..." }— Legacy WebRTC offer SDP{ type: "livekit", url: "wss://...", token: "..." }— User-managed LiveKit room
See Transport & Connectivity for details on source types.
Processing (clip mode):
target_fps(1-30) — frames per second to sampleclip_length_seconds(0.1-60) — duration of each clipdelay_seconds(>0) — time between inferences
Processing (frame mode):
interval_seconds(0.1-60) — time between frame captures
Inference:
prompt— what you want the AI to dobackend— (optional) inference backend. Defaults to"overshoot"if omitted.model— model identifieroutput_schema_json— optional JSON schema for structured outputmax_output_tokens— optional max tokens per inference request. If omitted, auto-set to the optimal value for your interval. See Output Token Limits.
Response (201) — native transport:
{
"stream_id": "abc123",
"livekit": {
"url": "wss://livekit.overshoot.ai",
"token": "<client JWT>"
},
"lease": {
"ttl_seconds": 45
},
"webrtc": null,
"turn_servers": null
}Keepalive
POST /streams/{stream_id}/keepaliveRenews the stream lease. Streams expire after 45 seconds without a keepalive. The SDK handles this automatically.
Response (200):
{
"status": "ok",
"stream_id": "abc123",
"ttl_seconds": 45,
"credits_remaining_cents": 950,
"cost_cents": 0.5,
"seconds_charged": 15.0,
"livekit_token": "<fresh JWT>"
}Close Stream
DELETE /streams/{stream_id}Closes the stream, triggers final billing, and releases resources.
Response (200):
{
"status": "ok"
}Update Prompt
PATCH /streams/{stream_id}/config/promptUpdates the inference prompt on a running stream.
Request body:
{
"prompt": "Count the number of people"
}Response (200):
{
"id": "cfg_123",
"stream_id": "abc123",
"prompt": "Count the number of people",
"backend": "overshoot",
"model": "Qwen/Qwen3.5-9B",
"output_schema_json": null,
"created_at": "2026-03-01T00:00:00Z",
"updated_at": "2026-03-01T00:01:00Z"
}List Models
GET /modelsReturns available models and their current status.
Response (200):
[
{
"model": "Qwen/Qwen3.5-9B",
"ready": true,
"status": "ready"
}
]| Status | ready | Meaning |
|---|---|---|
ready | true | Healthy, performing well |
degraded | true | Near capacity, expect higher latency |
saturated | false | At capacity, will reject new streams |
unavailable | false | Endpoint not reachable |
WebSocket — Stream Results
WS /ws/streams/{stream_id}Receives inference results in real-time. After connecting, send your API key as the first message:
{"api_key": "your-api-key"}Each subsequent message is a StreamInferenceResult:
{
"id": "res_123",
"stream_id": "abc123",
"mode": "clip",
"model_backend": "overshoot",
"model_name": "Qwen/Qwen3.5-9B",
"prompt": "Read any visible text",
"result": "Hello World",
"inference_latency_ms": 142.5,
"total_latency_ms": 285.3,
"ok": true,
"error": null,
"finish_reason": "stop"
}Close codes:
1008— Authentication failed1001— Stream ended (server-initiated)