Configuration
Most apps only need a few options:
apiKey-- your secret key from Overshoot. Get one at platform.overshoot.ai/api-keys (opens in a new tab).prompt-- what you want the AI to do.source-- where the video comes from (cameraby default, or avideofile). See Video Sources.model-- the model to use. See Models.onResult-- callback that receives results.
import { RealtimeVision } from 'overshoot'
const vision = new RealtimeVision({
apiKey: 'your-api-key',
model: 'Qwen/Qwen3.5-9B',
prompt: 'Read any visible text',
source: { type: 'camera', cameraFacing: 'environment' },
onResult: (result) => {
console.log(result.result)
}
})Updating the Prompt
You can change the prompt while the stream is running. This is useful when you want to ask different questions about the video without restarting.
vision.updatePrompt('Count the number of people')The next result will use the new prompt.
Max Output Tokens
Use maxOutputTokens to cap how many tokens the model generates per inference. This is useful when you only need short responses (e.g., a single word or a small JSON object).
const vision = new RealtimeVision({
apiKey: 'your-api-key',
model: 'Qwen/Qwen3.5-9B',
prompt: 'Is there a person? Answer yes or no.',
source: { type: 'camera', cameraFacing: 'environment' },
mode: 'frame',
frameProcessing: { interval_seconds: 0.5 },
maxOutputTokens: 10,
onResult: (result) => {
console.log(result.result)
}
})If you don't set maxOutputTokens, the server automatically picks the optimal value based on your processing interval. The default rate limit is 128 effective tokens per second per stream -- if you need more, reach out at founders@overshoot.ai.
If output gets truncated, the result's finish_reason will be "length" -- see Output.
For the full breakdown of how token limits work, including the formula, reference table, and examples, see Output Token Limits.
Processing Parameters
The SDK supports two processing modes: frame mode (default, analyzes individual frames as static images) and clip mode (analyzes video clips with temporal context). For detailed information about choosing the right mode, see Processing Modes.
Quick Reference
Frame mode (default) -- for static image analysis:
const vision = new RealtimeVision({
apiKey: 'your-api-key',
model: 'Qwen/Qwen3.5-9B',
prompt: 'Read all visible text',
source: { type: 'camera', cameraFacing: 'environment' },
mode: 'frame',
frameProcessing: {
interval_seconds: 0.5 // Capture a frame every 0.5 seconds (default)
},
onResult: (result) => {
console.log(result.result)
}
})Clip mode -- for motion and temporal understanding:
const vision = new RealtimeVision({
apiKey: 'your-api-key',
model: 'Qwen/Qwen3.5-9B',
prompt: 'Describe what the person is doing',
source: { type: 'camera', cameraFacing: 'environment' },
mode: 'clip',
clipProcessing: {
clip_length_seconds: 1, // Duration of each clip
delay_seconds: 1, // Time between results
target_fps: 6 // Frames per second to sample (1-30)
},
onResult: (result) => {
console.log(result.result)
}
})Deprecated Parameters
fpsandsampling_ratioare deprecated -- usetarget_fpsinstead. Theprocessingparameter is also deprecated in favor ofclipProcessingandframeProcessing. This is a JS SDK naming change; the API wire format still usesprocessing.
Processing Visualization
Play with the sliders below to see how processing parameters affect frame sampling.