Models - Overshoot

Overshoot serves a curated set of vision-language models tuned for real-time inference. Pick one from the picker, pass its id as the model field on /chat/completions, and Overshoot routes the request to a healthy endpoint.

List available models

Availability changes as endpoints come online and go offline. Always query /models before starting a stream. No auth required.

curl https://api.overshoot.ai/v1/models

The response is OpenAI-compatible — same shape listModels returns, with one extra status field per entry.

Sample response

{
  "object": "list",
  "data": [
    {
      "id": "Qwen/Qwen3.6-27B-FP8",
      "object": "model",
      "created": 1714492800,
      "owned_by": "overshoot",
      "status": "ready"
    },
    {
      "id": "google/gemma-4-31B-it",
      "object": "model",
      "created": 1714492800,
      "owned_by": "overshoot",
      "status": "ready"
    }
  ]
}

Active models

Snapshot as of 2026-05-01. The /models endpoint is the source of truth — treat these tables as a quick reference, not a guarantee.

Models on Overshoot fall into two groups: Overshoot-hosted (open-weights models we run on our own GPU fleet, tuned for real-time inference) and proprietary passthrough (Gemini / Claude / OpenAI, which we proxy to the upstream provider). Default to the hosted models — that’s where Overshoot’s latency advantage lives.

Overshoot-hosted

These are the fast path. We run them on our own GPUs, sized for sub-second time-to-first-token on single-frame inputs and high-throughput video.

Model	Provider	Context	Tokens / frame	Max frames
`Qwen/Qwen3.6-27B-FP8`	Qwen	32K	~200 @ 480p	Capped by context
`Qwen/Qwen3.6-35B-A3B-FP8`	Qwen	16K	~200 @ 480p	Capped by context (16K)
`google/gemma-4-31B-it`	Google	256K	70 / 140 / 280 / 560 / 1120	~60 (1 fps × 60 s)
`google/gemma-4-26B-A4B-it`	Google	256K	70 / 140 / 280 / 560 / 1120	~60 (1 fps × 60 s)
`Hcompany/Holo3-35B-A3B`	H Company	16K	~200 @ 480p	Capped by context (16K)

Proprietary passthrough

These are upstream APIs we expose through the same OpenAI-compatible surface for convenience. They are not part of Overshoot’s real-time path.

Proprietary models are passthrough to Google / Anthropic / OpenAI. Time-to-first-token is bounded by the upstream provider — typically seconds, not the sub-second latency Overshoot-hosted models hit. Reach for these only when you specifically need a frontier proprietary model; otherwise stay on the hosted list.

Model	Upstream	Modalities	Notes
`gemini-3-flash-preview`	Google Gemini	image, video	Fast Gemini tier
`gemini-3.1-pro-preview`	Google Gemini	image, video	Frontier reasoning, lowest RPM quota
`claude-haiku-4-5-20251001`	Anthropic	image only	Fastest Claude tier (no video)
`claude-sonnet-4-6`	Anthropic	image only
`claude-opus-4-6`	Anthropic	image only	Highest capability, highest latency
`gpt-5.4-nano`	OpenAI	image only	Cheapest GPT-5 tier
`gpt-5.4-mini`	OpenAI	image only
`gpt-5.4`	OpenAI	image only

How to read the columns

Context — served vs native

Served is the context length we run the model with.

Tokens / frame — Qwen models

Qwen3.6 uses the same image processor as the Qwen3 line: patch 16, temporal_patch_size=2, spatial_merge_size=2. The formula:

tokens_per_frame ≈ (H × W) / 2048

Resolution	Tokens / frame
480p (854×480)	~200
720p (1280×720)	~450
1080p (1920×1080)	~1010

Numbers in the table assume 480p — the resolution our benchmark suite uses. Higher resolutions consume context faster.

Tokens / frame — Gemma 4

You pick the visual-token budget per request — 70, 140, 280, 560, or 1120:

70–280 — classification, captioning, video understanding.
560–1120 — OCR, document parsing, small text.

Default is 256 tokens.

Max frames

Qwen / Holo3 — no hard model-side cap. Frame count is bounded by context. The practical limit is (context − text_input − text_output) / tokens_per_frame.
Gemma 4 — Google documents 60 s at 1 fps as the supported envelope (~60 frames).

Interleaved text + video

The model can mix text segments between visual tokens inside a single message — instead of forcing all visual content into one block followed by text. Every active model supports this.

Use a model

Pass the id from /models straight into /chat/completions:

curl -X POST https://api.overshoot.ai/v1/chat/completions \
  -H "Authorization: Bearer $OVERSHOOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.6-27B-FP8",
    "messages": [{
      "role": "user",
      "content": [
        { "type": "text", "text": "What is the person doing?" },
        { "type": "image_url", "image_url": {
            "url": "ovs://streams/$STREAM_ID?frame_index=-1"
        }}
      ]
    }]
  }'

See Chat Completion for the full request/response shape.

​List available models

​Active models

​Overshoot-hosted

​Proprietary passthrough

​How to read the columns

​Use a model

List available models

Active models

Overshoot-hosted

Proprietary passthrough

How to read the columns

Use a model