Docs/Utilities/Video Watch Mcp

Video Watch MCP

Video Watch MCP gives AI agents the ability to watch and analyse video content. Submit a video link, and the server extracts transcripts, frames, or both — then returns them to the AI for discussion.

GitHub MIT License

Tools

Three specialised tools for different use cases:

Tool	Best For	What It Returns
`video_listen`	Podcasts, talks, interviews	Full transcript via Whisper
`video_see`	Dance, art, visual content	Extracted frames at configurable FPS
`watch_video`	When both audio and visuals matter	Frames + transcript combined

Supported Platforms

Over 1000 platforms via yt-dlp integration:

YouTube
TikTok
Instagram Reels
Twitter/X
Reddit
Facebook
Vimeo
And hundreds more

How It Works

Video URL  -->  Modal (cloud)  -->  yt-dlp downloads
                                     │
                         ┌───────────┴───────────┐
                         │                       │
                    FFmpeg frames          Whisper transcript
                         │                       │
                         └───────────┬───────────┘
                                     │
                              Returned to AI

Processing happens in Modal’s serverless cloud — no local GPU or heavy dependencies needed.

Setup

Sign up at modal.com. The free tier includes $30/month of compute credits — enough for roughly 15,000 short video analyses.

2. Install and Deploy

pip install modal
git clone https://github.com/codependentai/video-watch-mcp.git
cd video-watch-mcp
modal deploy mcp_remote.py

Modal outputs a URL for your deployed server.

3. Connect to Claude

Add to your Claude Desktop config:

{
  "mcpServers": {
    "video-watch": {
      "type": "url",
      "url": "https://your-modal-url/mcp"
    }
  }
}

Restart Claude Desktop.

Configuration

Parameter	Default	Description
`fps`	0.5	Frames extracted per second
`max_frames`	10	Maximum frames returned (cap: 20)
Whisper model	`base`	Options: `base`, `small`, `medium` — accuracy vs speed tradeoff