API Reference

The full interactive API reference is available at:

api.overshoot.ai/docs (opens in a new tab)

Authentication

All API requests require an API key passed as a Bearer token:

Authorization: Bearer your-api-key

Get your API key at platform.overshoot.ai/api-keys (opens in a new tab).

Base URL

https://api.overshoot.ai/v0.2

Endpoints

Create Stream

POST /streams

Creates a new stream and returns connection details.

Request body:

{
  "mode": "clip",
  "processing": {
    "target_fps": 6,
    "clip_length_seconds": 0.5,
    "delay_seconds": 0.5
  },
  "inference": {
    "prompt": "Read any visible text",
    "model": "Qwen/Qwen3.5-9B",
    "output_schema_json": null,
    "max_output_tokens": null
  }
}

Source types (optional):

Omit field or { type: "native" } — Native LiveKit transport (default, recommended)
{ type: "webrtc", sdp: "..." } — Legacy WebRTC offer SDP
{ type: "livekit", url: "wss://...", token: "..." } — User-managed LiveKit room

See Transport & Connectivity for details on source types.

Processing (clip mode):

target_fps (1-30) — frames per second to sample
clip_length_seconds (0.1-60) — duration of each clip
delay_seconds (>0) — time between inferences

Processing (frame mode):

interval_seconds (0.1-60) — time between frame captures

Inference:

prompt — what you want the AI to do
backend — (optional) inference backend. Defaults to "overshoot" if omitted.
model — model identifier
output_schema_json — optional JSON schema for structured output
max_output_tokens — optional max tokens per inference request. If omitted, auto-set to the optimal value for your interval. See Output Token Limits.

Response (201) — native transport:

{
  "stream_id": "abc123",
  "livekit": {
    "url": "wss://livekit.overshoot.ai",
    "token": "<client JWT>"
  },
  "lease": {
    "ttl_seconds": 45
  },
  "webrtc": null,
  "turn_servers": null
}

Keepalive

POST /streams/{stream_id}/keepalive

Renews the stream lease. Streams expire after 45 seconds without a keepalive. The SDK handles this automatically.

Response (200):

{
  "status": "ok",
  "stream_id": "abc123",
  "ttl_seconds": 45,
  "credits_remaining_cents": 950,
  "cost_cents": 0.5,
  "seconds_charged": 15.0,
  "livekit_token": "<fresh JWT>"
}

Close Stream

DELETE /streams/{stream_id}

Closes the stream, triggers final billing, and releases resources.

Response (200):

{
  "status": "ok"
}

Update Prompt

PATCH /streams/{stream_id}/config/prompt

Updates the inference prompt on a running stream.

Request body:

{
  "prompt": "Count the number of people"
}

Response (200):

{
  "id": "cfg_123",
  "stream_id": "abc123",
  "prompt": "Count the number of people",
  "backend": "overshoot",
  "model": "Qwen/Qwen3.5-9B",
  "output_schema_json": null,
  "created_at": "2026-03-01T00:00:00Z",
  "updated_at": "2026-03-01T00:01:00Z"
}

List Models

GET /models

Returns available models and their current status.

Response (200):

[
  {
    "model": "Qwen/Qwen3.5-9B",
    "ready": true,
    "status": "ready"
  }
]

Status	`ready`	Meaning
`ready`	`true`	Healthy, performing well
`degraded`	`true`	Near capacity, expect higher latency
`saturated`	`false`	At capacity, will reject new streams
`unavailable`	`false`	Endpoint not reachable

WebSocket — Stream Results

WS /ws/streams/{stream_id}

Receives inference results in real-time. After connecting, send your API key as the first message:

{"api_key": "your-api-key"}

Each subsequent message is a StreamInferenceResult:

{
  "id": "res_123",
  "stream_id": "abc123",
  "mode": "clip",
  "model_backend": "overshoot",
  "model_name": "Qwen/Qwen3.5-9B",
  "prompt": "Read any visible text",
  "result": "Hello World",
  "inference_latency_ms": 142.5,
  "total_latency_ms": 285.3,
  "ok": true,
  "error": null,
  "finish_reason": "stop"
}

Close codes:

1008 — Authentication failed
1001 — Stream ended (server-initiated)

Models Output Token Limits