Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.overshoot.ai/llms.txt

Use this file to discover all available pages before exploring further.

A few habits make Overshoot apps feel snappy and behave well in production.

Latency

Send only what you need

A single frame (image_url) costs a fraction of a video segment. Most “what’s happening?” questions only need the latest frame.

Keep prompts terse

System messages and few-shot examples are tokens too. Visual tokens dominate, but text adds up at high frame counts.

Pick a smaller model

gemma-4-E2B-it and Qwen3.5-9B are good defaults. Reach for 27B+ only when quality demands it.

Drop resolution

480p Qwen costs ~5× fewer tokens per frame than 1080p. Publish at the resolution you actually need.

Streams

  • Renew before you have to. Streams expire after 5 minutes idle. Call /keepalive every 2 minutes.
  • Save the keepalive token. Each /keepalive returns a fresh LiveKit token — keep it around for reconnects.
  • Delete when done. A DELETE /streams/{id} releases resources immediately instead of waiting for the lease.

Reliability

  • Try to list models before you start. /models is the source of truth. Don’t hardcode an id and assume it’s serving.
  • Handle 503 on completions. It means the replica fell over. Retry with backoff, or fall back to another ready model.
  • Treat frame_index as monotonic. If you reference an old index that’s been evicted, the resolver clamps to the oldest available — your request still succeeds, but on a different frame than you asked for.