Video Watch MCP
Video Watch MCP gives AI agents the ability to watch and analyse video content. Submit a video link, and the server extracts transcripts, frames, or both — then returns them to the AI for discussion.
GitHub MIT License
Tools
Three specialised tools for different use cases:
| Tool | Best For | What It Returns |
|---|---|---|
video_listen | Podcasts, talks, interviews | Full transcript via Whisper |
video_see | Dance, art, visual content | Extracted frames at configurable FPS |
watch_video | When both audio and visuals matter | Frames + transcript combined |
Supported Platforms
Over 1000 platforms via yt-dlp integration:
- YouTube
- TikTok
- Instagram Reels
- Twitter/X
- Vimeo
- And hundreds more
How It Works
Video URL --> Modal (cloud) --> yt-dlp downloads
│
┌───────────┴───────────┐
│ │
FFmpeg frames Whisper transcript
│ │
└───────────┬───────────┘
│
Returned to AI
Processing happens in Modal’s serverless cloud — no local GPU or heavy dependencies needed.
Setup
1. Create a Modal Account
Sign up at modal.com. The free tier includes $30/month of compute credits — enough for roughly 15,000 short video analyses.
2. Install and Deploy
pip install modal
git clone https://github.com/codependentai/video-watch-mcp.git
cd video-watch-mcp
modal deploy mcp_remote.py
Modal outputs a URL for your deployed server.
3. Connect to Claude
Add to your Claude Desktop config:
{
"mcpServers": {
"video-watch": {
"type": "url",
"url": "https://your-modal-url/mcp"
}
}
}
Restart Claude Desktop.
Configuration
| Parameter | Default | Description |
|---|---|---|
fps | 0.5 | Frames extracted per second |
max_frames | 10 | Maximum frames returned (cap: 20) |
| Whisper model | base | Options: base, small, medium — accuracy vs speed tradeoff |
Cost
Approximately $0.002 per 30-second video. The $30/month free tier covers casual to moderate use comfortably.
Use Cases
- Research — transcribe talks and lectures for analysis
- Social media — understand TikTok trends, review video content
- Accessibility — generate descriptions of visual content
- Content creation — analyse competitor videos, extract quotes