Introduction

Overshoot SDK

Copy docs

Run any Vision Language Model on live video. Point a video source, describe what you want in plain English, and get results in real-time.

How Overshoot works

  1. Point Overshoot at a video source (camera, video file, screen, HLS, RTSP, or LiveKit)
  2. Tell it what you want in plain English
  3. Get results continuously as the video plays — as fast as 200ms

Get Started

Quick Example

import { RealtimeVision } from 'overshoot'
 
const vision = new RealtimeVision({
  apiKey: 'your-api-key',
  model: 'Qwen/Qwen3.5-9B',
  prompt: 'Read any visible text',
  source: { type: 'camera', cameraFacing: 'environment' },
  onResult: (result) => console.log(result.result)
})
 
await vision.start()

Get your API key at platform.overshoot.ai/api-keys (opens in a new tab).

What People Build with Overshoot

Overshoot connects any video source to any vision model and streams results back in real-time. You describe what you want in plain English — the prompt is the program.

  • OCR on camera feeds — read text, signs, labels, documents
  • Accessibility — real-time scene description for blind and low vision users
  • Security and monitoring — detect events, people, anomalies on RTSP/HLS cameras
  • Sports and fitness — analyze form, count reps, commentate live action
  • Robotics and computer use — agents that see and act on screen or camera input
  • Retail analytics — count customers, track inventory, monitor displays
  • Structured data extraction — pull JSON from video with outputSchema

Results arrive continuously as the video plays, as fast as 200ms.

Core Concepts