Migrating from openai-whisper to faster-whisper for Faster Transcription

I’ve been running a skill that automatically generates meeting minutes from video files. The audio transcription component was using openai-whisper (OpenAI’s official Whisper CLI), but after migrating to faster-whisper, processing speed and memory efficiency improved significantly.

This article covers the motivation for the migration, the specific changes made, and the results.

Motivation

openai-whisper is OpenAI’s official implementation, and it’s appealing for its simplicity—you can transcribe audio with a single whisper command. However, in practice, I had the following concerns:

Slow processing: Transcription of long videos (over 1 hour) takes a considerable amount of time
High memory usage: Being PyTorch-based, model loading consumes significant memory

faster-whisper is a reimplementation based on CTranslate2, claiming up to 4x speedup and lower memory consumption.

Changes Made

Dependency Change

# Before
pip install openai-whisper
 
# After
pip install faster-whisper

openai-whisper depends on PyTorch, which means large download sizes during installation. faster-whisper is CTranslate2-based with lighter dependencies.

From CLI to Python Script

With openai-whisper, I was calling the whisper CLI command directly. Since faster-whisper is provided as a Python library, I created a wrapper script:

import argparse
 
 
def format_time(seconds):
    h = int(seconds // 3600)
    m = int((seconds % 3600) // 60)
    s = seconds % 60
    return f"{h:02d}:{m:02d}:{s:05.2f}"
 
 
def main():
    parser = argparse.ArgumentParser(
        description="Transcribe audio using faster-whisper."
    )
    parser.add_argument("audio_path", help="Path to the audio file to transcribe.")
    parser.add_argument("--language", default="ja", help="Language code (default: ja)")
    parser.add_argument("--model", default="large-v3", help="Whisper model name (default: large-v3)")
    parser.add_argument("--output", default="meeting_audio.txt", help="Output text file path")
    parser.add_argument("--device", default="cpu", help="Device to use (default: cpu)")
    parser.add_argument("--compute-type", default="int8", help="Compute type for quantization (default: int8)")
    parser.add_argument("--beam-size", type=int, default=5, help="Beam size (default: 5)")
    args = parser.parse_args()
 
    from faster_whisper import WhisperModel
 
    print(f"Loading model '{args.model}' on {args.device} ({args.compute_type})...", flush=True)
    model = WhisperModel(args.model, device=args.device, compute_type=args.compute_type)
 
    print(f"Transcribing '{args.audio_path}' (language={args.language})...", flush=True)
    segments, info = model.transcribe(
        args.audio_path, language=args.language, beam_size=args.beam_size
    )
 
    print(f"Detected language: {info.language} (probability {info.language_probability:.2f})", flush=True)
 
    lines = []
    for segment in segments:
        line = segment.text.strip()
        print(f"[{format_time(segment.start)} -> {format_time(segment.end)}] {line}", flush=True)
        lines.append(line)
 
    with open(args.output, "w", encoding="utf-8") as f:
        f.write("\n".join(lines) + "\n")
 
    print(f"\nTranscription saved to '{args.output}'", flush=True)
 
 
if __name__ == "__main__":
    main()

Key points:

int8 quantization: INT8 quantization is enabled by default, optimizing memory efficiency even on CPU
large-v3 model: Changed from openai-whisper’s turbo model to large-v3. Thanks to faster-whisper’s optimizations, larger models run at practical speeds

Usage Change

# Before
whisper meeting_audio.wav --language ja --model turbo > meeting_audio.log 2>&1
 
# After
python scripts/transcribe.py meeting_audio.wav --language ja --model large-v3 > meeting_audio.log 2>&1

The workflow of running in the background and monitoring with tail -f meeting_audio.log remains the same. However, the completion detection method has changed:

# Before: Complete when log stops updating (ambiguous)
ps aux | grep whisper
 
# After: Complete when "Transcription saved to" appears in log (explicit)
ps aux | grep transcribe.py

Summary

Item	openai-whisper	faster-whisper
Engine	PyTorch	CTranslate2
Speed	Baseline	Up to 4x faster
Memory	High	Low (INT8 quantization)
Interface	CLI command	Python library
Model	turbo	large-v3

The migration itself was straightforward—including the wrapper script creation, it was completed in a short time. If you’re dissatisfied with openai-whisper’s processing speed or memory consumption, migrating to faster-whisper is well worth considering.

That’s all for today—faster transcription is always better. That’s all from the Gemba.