Best practices - Overshoot

A few habits make Overshoot apps feel snappy and behave well in production.

Latency

A single frame (image_url) costs a fraction of a video segment. Most “what’s happening?” questions only need the latest frame.

System messages and few-shot examples are tokens too. Visual tokens dominate, but text adds up at high frame counts.

Start with gemma-4-26B-A4B-it or Qwen3.6-35B-A3B-FP8. Move up only if quality is bad.

480p Qwen costs ~5× fewer tokens per frame than 1080p. Publish at the resolution you actually need.

Renew before you have to. Streams expire after 5 minutes idle. Call /keepalive every 2 minutes.
Save the keepalive token. Each /keepalive returns a fresh LiveKit token — keep it around for reconnects.
Delete when done. A DELETE /streams/{id} releases resources immediately instead of waiting for the lease.

Try to list models before you start. /models is the source of truth. Don’t hardcode an id and assume it’s serving.
Handle 503 on completions. It means the replica fell over. Retry with backoff, or fall back to another ready model.
Treat frame_index as monotonic. If you reference an old index that’s been evicted, the resolver clamps to the oldest available — your request still succeeds, but on a different frame than you asked for.

⌘I