A few habits make Overshoot apps feel snappy and behave well in production.Documentation Index
Fetch the complete documentation index at: https://docs.overshoot.ai/llms.txt
Use this file to discover all available pages before exploring further.
Latency
Send only what you need
A single frame (
image_url) costs a fraction of a video segment. Most “what’s happening?” questions only need the latest frame.Keep prompts terse
System messages and few-shot examples are tokens too. Visual tokens dominate, but text adds up at high frame counts.
Pick a smaller model
gemma-4-E2B-it and Qwen3.5-9B are good defaults. Reach for 27B+ only when quality demands it.Drop resolution
480p Qwen costs ~5× fewer tokens per frame than 1080p. Publish at the resolution you actually need.
Streams
- Renew before you have to. Streams expire after 5 minutes idle. Call
/keepaliveevery 2 minutes. - Save the keepalive token. Each
/keepalivereturns a fresh LiveKit token — keep it around for reconnects. - Delete when done. A
DELETE /streams/{id}releases resources immediately instead of waiting for the lease.
Reliability
- Try to list models before you start.
/modelsis the source of truth. Don’t hardcode anidand assume it’s serving. - Handle
503on completions. It means the replica fell over. Retry with backoff, or fall back to anotherreadymodel. - Treat
frame_indexas monotonic. If you reference an old index that’s been evicted, the resolver clamps to the oldest available — your request still succeeds, but on a different frame than you asked for.