Overshoot SDK
Run any Vision Language Model on live video. Point a video source, describe what you want in plain English, and get results in real-time.

- Point Overshoot at a video source (camera, video file, screen, HLS, RTSP, or LiveKit)
- Tell it what you want in plain English
- Get results continuously as the video plays — as fast as 200ms
Get Started
Quick Example
import { RealtimeVision } from 'overshoot'
const vision = new RealtimeVision({
apiKey: 'your-api-key',
model: 'Qwen/Qwen3.5-9B',
prompt: 'Read any visible text',
source: { type: 'camera', cameraFacing: 'environment' },
onResult: (result) => console.log(result.result)
})
await vision.start()Get your API key at platform.overshoot.ai/api-keys (opens in a new tab).
What People Build with Overshoot
Overshoot connects any video source to any vision model and streams results back in real-time. You describe what you want in plain English — the prompt is the program.
- OCR on camera feeds — read text, signs, labels, documents
- Accessibility — real-time scene description for blind and low vision users
- Security and monitoring — detect events, people, anomalies on RTSP/HLS cameras
- Sports and fitness — analyze form, count reps, commentate live action
- Robotics and computer use — agents that see and act on screen or camera input
- Retail analytics — count customers, track inventory, monitor displays
- Structured data extraction — pull JSON from video with
outputSchema
Results arrive continuously as the video plays, as fast as 200ms.