Welcome to Overshoot

Mental model for agents
Authentication
Lifecycle invariants
Endpoint surface

Overshoot enables developers to build realtime vision applications in two steps:

Create a /streams session and connect a live video source to it
Ask any model about any moment of the video stream via /chat/completions request

That’s it. 2 HTTP stateless endpoints is all you need to build anything. If you can’t be bothered to read the documentation, here’s what you need to know:

When you create a Stream , you get back a Stream ID, a LiveKit Room URL with a token. Use a LiveKit SDK to publish your video stream in the room.

Once a Stream is connected, you can refer to parts or all of it inside your OpenAI Compatible Chat Completion request.

To refer to single frames, pass them as image_url inside the message content.

// last frame
{
  "type": "image_url",
  "image_url": {
    "url": "ovs://streams/{stream_id}?frame_index=-1"
  }
}

Similarly, to refer to segments in the live stream, pass them as video_url

// last 5 seconds
{
  "type": "video_url",
  "video_url": {
    "url": "ovs://streams/{stream_id}?start_offset_ms=-5000"
  }
}

Hope this makes sense. Enjoy!

Quickstart

Webcam to model in four steps.

The Stream

Lifecycle, leases, and how to keep a session alive.

Chat Completion

URL grammar for referencing frames and segments.

Models

What’s available, context limits, picking the right one.

Quickstart

⌘I

Documentation Index

Quickstart

The Stream

Chat Completion

Models