Docs/Utilities/Video Watch Mcp

Video Watch MCP

Video Watch MCP gives AI agents the ability to watch and analyse video content. Submit a video link, and the server extracts transcripts, frames, or both — then returns them to the AI for discussion.

GitHub   MIT License

Tools

Three specialised tools for different use cases:

ToolBest ForWhat It Returns
video_listenPodcasts, talks, interviewsFull transcript via Whisper
video_seeDance, art, visual contentExtracted frames at configurable FPS
watch_videoWhen both audio and visuals matterFrames + transcript combined

Supported Platforms

Over 1000 platforms via yt-dlp integration:

  • YouTube
  • TikTok
  • Instagram Reels
  • Twitter/X
  • Reddit
  • Facebook
  • Vimeo
  • And hundreds more

How It Works

Video URL  -->  Modal (cloud)  -->  yt-dlp downloads

                         ┌───────────┴───────────┐
                         │                       │
                    FFmpeg frames          Whisper transcript
                         │                       │
                         └───────────┬───────────┘

                              Returned to AI

Processing happens in Modal’s serverless cloud — no local GPU or heavy dependencies needed.

Setup

1. Create a Modal Account

Sign up at modal.com. The free tier includes $30/month of compute credits — enough for roughly 15,000 short video analyses.

2. Install and Deploy

pip install modal
git clone https://github.com/codependentai/video-watch-mcp.git
cd video-watch-mcp
modal deploy mcp_remote.py

Modal outputs a URL for your deployed server.

3. Connect to Claude

Add to your Claude Desktop config:

{
  "mcpServers": {
    "video-watch": {
      "type": "url",
      "url": "https://your-modal-url/mcp"
    }
  }
}

Restart Claude Desktop.

Configuration

ParameterDefaultDescription
fps0.5Frames extracted per second
max_frames10Maximum frames returned (cap: 20)
Whisper modelbaseOptions: base, small, medium — accuracy vs speed tradeoff

Cost

Approximately $0.002 per 30-second video. The $30/month free tier covers casual to moderate use comfortably.

Use Cases

  • Research — transcribe talks and lectures for analysis
  • Social media — understand TikTok trends, review video content
  • Accessibility — generate descriptions of visual content
  • Content creation — analyse competitor videos, extract quotes