Skip to main content
Overshoot serves a curated set of vision-language models tuned for real-time inference. Pick one from the picker, pass its id as the model field on /chat/completions, and Overshoot routes the request to a healthy endpoint.

List available models

Availability changes as endpoints come online and go offline. Always query /models before starting a stream. No auth required.
curl https://api.overshoot.ai/v1/models
The response is OpenAI-compatible — same shape listModels returns, with one extra status field per entry.
{
  "object": "list",
  "data": [
    {
      "id": "Qwen/Qwen3.6-27B-FP8",
      "object": "model",
      "created": 1714492800,
      "owned_by": "overshoot",
      "status": "ready"
    },
    {
      "id": "google/gemma-4-31B-it",
      "object": "model",
      "created": 1714492800,
      "owned_by": "overshoot",
      "status": "ready"
    }
  ]
}

Active models

Snapshot as of 2026-05-01. The /models endpoint is the source of truth — treat these tables as a quick reference, not a guarantee.
Models on Overshoot fall into two groups: Overshoot-hosted (open-weights models we run on our own GPU fleet, tuned for real-time inference) and proprietary passthrough (Gemini / Claude / OpenAI, which we proxy to the upstream provider). Default to the hosted models — that’s where Overshoot’s latency advantage lives.

Overshoot-hosted

These are the fast path. We run them on our own GPUs, sized for sub-second time-to-first-token on single-frame inputs and high-throughput video.
ModelProviderContextTokens / frameMax frames
Qwen/Qwen3.6-27B-FP8Qwen32K~200 @ 480pCapped by context
Qwen/Qwen3.6-35B-A3B-FP8Qwen16K~200 @ 480pCapped by context (16K)
google/gemma-4-31B-itGoogle256K70 / 140 / 280 / 560 / 1120~60 (1 fps × 60 s)
google/gemma-4-26B-A4B-itGoogle256K70 / 140 / 280 / 560 / 1120~60 (1 fps × 60 s)
Hcompany/Holo3-35B-A3BH Company16K~200 @ 480pCapped by context (16K)

Proprietary passthrough

These are upstream APIs we expose through the same OpenAI-compatible surface for convenience. They are not part of Overshoot’s real-time path.
Proprietary models are passthrough to Google / Anthropic / OpenAI. Time-to-first-token is bounded by the upstream provider — typically seconds, not the sub-second latency Overshoot-hosted models hit. Reach for these only when you specifically need a frontier proprietary model; otherwise stay on the hosted list.
ModelUpstreamModalitiesNotes
gemini-3-flash-previewGoogle Geminiimage, videoFast Gemini tier
gemini-3.1-pro-previewGoogle Geminiimage, videoFrontier reasoning, lowest RPM quota
claude-haiku-4-5-20251001Anthropicimage onlyFastest Claude tier (no video)
claude-sonnet-4-6Anthropicimage only
claude-opus-4-6Anthropicimage onlyHighest capability, highest latency
gpt-5.4-nanoOpenAIimage onlyCheapest GPT-5 tier
gpt-5.4-miniOpenAIimage only
gpt-5.4OpenAIimage only

How to read the columns

Served is the context length we run the model with.
Qwen3.6 uses the same image processor as the Qwen3 line: patch 16, temporal_patch_size=2, spatial_merge_size=2. The formula:
tokens_per_frame ≈ (H × W) / 2048
ResolutionTokens / frame
480p (854×480)~200
720p (1280×720)~450
1080p (1920×1080)~1010
Numbers in the table assume 480p — the resolution our benchmark suite uses. Higher resolutions consume context faster.
You pick the visual-token budget per request — 70, 140, 280, 560, or 1120:
  • 70–280 — classification, captioning, video understanding.
  • 560–1120 — OCR, document parsing, small text.
Default is 256 tokens.
  • Qwen / Holo3 — no hard model-side cap. Frame count is bounded by context. The practical limit is (context − text_input − text_output) / tokens_per_frame.
  • Gemma 4 — Google documents 60 s at 1 fps as the supported envelope (~60 frames).
The model can mix text segments between visual tokens inside a single message — instead of forcing all visual content into one block followed by text. Every active model supports this.

Use a model

Pass the id from /models straight into /chat/completions:
curl -X POST https://api.overshoot.ai/v1/chat/completions \
  -H "Authorization: Bearer $OVERSHOOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.6-27B-FP8",
    "messages": [{
      "role": "user",
      "content": [
        { "type": "text", "text": "What is the person doing?" },
        { "type": "image_url", "image_url": {
            "url": "ovs://streams/$STREAM_ID?frame_index=-1"
        }}
      ]
    }]
  }'
See Chat Completion for the full request/response shape.