Auto-Generating Meeting Minutes from Video with a Gemini CLI Skill

Tadashi Shigeoka ·  Fri, February 13, 2026

I created a Gemini CLI skill called video-to-minutes that auto-generates meeting minutes from video files.

Overview of the video-to-minutes Skill

This skill takes a video file as input, executes the following steps sequentially, and ultimately generates meeting minutes in Markdown format.

StepProcessTool Used
1Check and install prerequisite toolsffmpeg, whisper
2Get video path and capture interval from user (default 60s)-
3Extract audio from videoffmpeg
4Transcribe audio (automated)whisper (turbo model)
5Capture images from video at regular intervalsffmpeg
6Collect proper nouns from user-
7Analyze transcript and generate meeting minutesGemini
8Save as Markdown file-

Why I Turned This into a Skill

In my previous article “Gemini CLI’s Default Capabilities Are So Powerful That I Reconsidered My Approach to Creating Skills”, I introduced the DCAP cycle — an approach where you first ask the AI directly and only turn repeated operations into skills.

This video-to-minutes workflow was exactly a case that warranted being turned into a skill.

Criteria for Skill CreationWhy It Applies
Repeated executionThe same workflow runs for every meeting
Fine-tuning has stabilizedffmpeg options and Whisper model settings are settled
Want to share with othersTeam members can create minutes using the same process
Want to ensure qualityStandardize the meeting minutes format

Key Points of the Workflow

Automated Whisper Transcription

The agent automatically runs Whisper transcription using the turbo model.

whisper meeting_audio.wav --language ja --model turbo

The generated meeting_audio.txt is auto-detected, and the user is only prompted for the path if the file is not found.

Collecting Proper Nouns to Improve Accuracy

A step was added to collect proper nouns (names of people, companies, products, etc.) from the user before generating the meeting minutes. This helps reduce transcription errors and improves the overall accuracy of the minutes.

Capturing Images to Preserve Visual Information

In addition to audio transcription, the skill captures images from the video at regular intervals, allowing slide and whiteboard content to be included in the minutes.

ffmpeg -i "<VIDEO_FILE_PATH>" -vf fps=1/<INTERVAL_IN_SECONDS> captures/capture_%03d.png

The capture interval is user-configurable, allowing flexible adjustment based on the meeting content.

Summary

The workflow for generating meeting minutes from video requires executing multiple tools in sequence, and doing it manually every time is tedious. Multi-step workflows like this that are used repeatedly are where skill creation has the greatest impact.

The skill is published in the oh-my-skills repository.

That’s all from the Gemba, reporting on creating a skill that auto-generates meeting minutes from video.

References