AutoCut：基于 Whisper 的视频剪辑工具

OpenAI’s Whisper

安装

pip install -U openai-whisper

同时还需要安装 ffmpeg 工具：

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

支持的模型和语言

Python 用法

import whisper

# 加载模型，这里是 base 模型
model = whisper.load_model("base")

# 转录
# audio.mp3
result = model.transcribe("./九转大肠最全前因后果（补档）.mp4")

# 打印转录结果
print(result["text"])

transcribe() 方法读取整个视频文件，并且使用 大小为 30 秒的滑动窗口 来处理音频，对每个窗口执行自回归的序列到序列（sequence-to-sequence）的预测。

AutoCut：通过字幕来剪切视频

AutoCut 对你的视频自动生成字幕。然后你选择需要保留的句子，AutoCut 将对你视频中对应的片段裁切并保存。你无需使用视频编辑软件，只需要编辑文本文件即可完成剪切。

安装

pip install git+https://github.com/mli/autocut.git

AutoCut 默认需要使用 Python 3.9 及以上版本，因为只有 Python 3.9 及以上版本才支持 argparse.BooleanOptionalAction。如果要使用 Python 3.9 以下的版本（这里我尝试的是 python 3.8），需要进行如下调整：

在 setup.cfg 中对 python_requires = >= 3.9 进行修改
在 main.py 中将所有的 argparse.BooleanOptionalAction 进行注释

在进行 AutoCut 的安装时，还可能会遇到无法安装 whisper，进而无法完成 AutoCut 安装的情况，下面是报错的输出：

WARNING: Generating metadata for package whisper produced metadata for project name openai-whisper. Fix your #egg=whisper fragments.

Discarding git+https://github.com/openai/whisper.git: Requested openai-whisper from git+https://github.com/openai/whisper.git (from autocut==0.0.3) has inconsistent name: expected 'whisper', but metadata has 'openai-whisper'

ERROR: Could not find a version that satisfies the requirement whisper (unavailable) (from autocut) (from versions: 0.9.5, 0.9.6, 0.9.7, 0.9.8, 0.9.9, 0.9.10, 0.9.11, 0.9.12, 0.9.13, 0.9.14, 0.9.15, 0.9.16, 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.1.9, 1.1.10)

ERROR: No matching distribution found for whisper (unavailable)

根据报错输出的信息，我们可以看到主要的原因是：whisper 库的 metadata name 不匹配（应该是被 OpenAI 更改了），只需要在 setup.py 中进行如下的更改：

# 更改前
requirements = [
    "whisper @ git+https://github.com/openai/whisper.git",
]

# 更改后
requirements = [
    "openai-whisper @ git+https://github.com/openai/whisper.git",
]

经过上述的修改之后，我们可以按照以下命令对 AutoCut 进行本地安装，如下所示：

# 本地安装
git clone https://github.com/mli/autocut
cd autocut
pip install .

Transcribe（转录）

主要使用 OpenAI 的 Whisper 模型对（视频中的）音频进行读取和转录，并将转录后的结果输出为 SRT 字幕文件和 Markdown 文件。

为了识别未出声的片段（Voice Activity Detect，VAD），可以调用现有的 VAD 方法（这里调用 Silero VAD 方法），参见 transcribe.py 中的 _detect_voice_activity() 函数：

# torch load limit https://github.com/pytorch/vision/issues/4156
torch.hub._validate_not_a_forked_repo = lambda a, b, c: True
self.vad_model, funcs = torch.hub.load(
    repo_or_dir="snakers4/silero-vad", model="silero_vad", trust_repo=True
)

self.detect_speech = funcs[0]

speeches = self.detect_speech(
    audio, self.vad_model, sampling_rate=self.sampling_rate
)

使用方法

!autocut -t inputs 九转大肠最全前因后果（补档）.mp4

!autocut -t inputs 九转大肠最全前因后果（补档）.mp4 --whisper-model medium

关键代码

代码具体参考：AutoCut 中的 transcribe.py 脚本：

# 读取音频/视频文件
audio = whisper.load_audio(input, sr=self.sampling_rate)

# 加载 Whisper 语言模型
self.whisper_model = whisper.load_model(
    self.args.whisper_model, self.args.device
)

# 进行转录
# 对于 CPU，使用多线程进行加速（默认使用双线程）
r = self.whisper_model.transcribe(
    audio[int(seg["start"]) : int(seg["end"])],
    task="transcribe",
    language=self.args.lang,
    initial_prompt=self.args.prompt,
    verbose=False if len(speech_timestamps) == 1 else None,
)

结果展示

!autocut -t inputs 九转大肠最全前因后果（补档）.mp4

[autocut:driver.py:L120] INFO   Generating grammar tables from /usr/lib/python3.8/lib2to3/Grammar.txt
[autocut:driver.py:L120] INFO   Generating grammar tables from /usr/lib/python3.8/lib2to3/PatternGrammar.txt
[autocut:utils.py:L160] INFO   NumExpr defaulting to 2 threads.
[autocut:transcribe.py:L37] INFO   Transcribing 九转大肠最全前因后果（补档）.mp4
Using cache found in /root/.cache/torch/hub/snakers4_silero-vad_master
[autocut:transcribe.py:L86] INFO   Done voice activity detection in 28.9 sec
  0% 0/42 [00:00<?, ?it/s]/usr/local/lib/python3.8/dist-packages/whisper/transcribe.py:79: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
100% 42/42 [19:15<00:00, 27.51s/it]
[autocut:transcribe.py:L138] INFO   Done transcription in 1164.9 sec
[autocut:srt.py:L303] INFO   Skipped subtitle at index 0: Subtitle start time >= end time
[autocut:transcribe.py:L55] INFO   Transcribed 九转大肠最全前因后果（补档）.mp4 to 九转大肠最全前因后果（补档）.srt
[autocut:transcribe.py:L57] INFO   Saved texts to 九转大肠最全前因后果（补档）.md to mark sentences

Cut

代码具体参考：AutoCut 中的 cut.py 脚本

主要使用 srt 和 moviepy 这两个开源库来对视频/音频进行剪辑。srt 库负责对 SRT 字幕文件的读取，moviepy 库负责对视频/音频文件进行读写和剪辑。

使用方法

!autocut -c 九转大肠最全前因后果（补档）.mp4 九转大肠最全前因后果（补档）.srt

关键代码

srt.parse()：解析 SRT 字幕文件
editor.VideoFileClip()、editor.AudioFileClip()
subclip()：根据 SRT 字幕文件，将视频/音频剪切为多个片段（clips/segments）
editor.concatenate_videoclips()、editor.concatenate_audioclips()：将多个片段进行拼接
editor.VideoClip.write_videofile()、editor.AudioClip.write_audiofile()：将多个片段写入到一个视频/音频文件中

import srt
from moviepy import editor

with open(fns["srt"], encoding=self.args.encoding) as f:
    subs = list(srt.parse(f.read()))

# 重新编排 SRT 字幕文件的顺序
segments = []
# Avoid disordered subtitles
subs.sort(key=lambda x: x.start)

if is_video_file:
    media = editor.VideoFileClip(fns["media"])
else:
    media = editor.AudioFileClip(fns["media"])

clips = [media.subclip(s["start"], s["end"]) for s in segments]
if is_video_file:
    final_clip: editor.VideoClip = editor.concatenate_videoclips(clips)
    final_clip.write_videofile()
else:
    final_clip: editor.AudioClip = editor.concatenate_audioclips(clips)
    final_clip.write_audiofile()

问题

在进行剪辑时，出现以下错误：

Traceback (most recent call last):
  File "/usr/local/bin/autocut", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/autocut/main.py", line 113, in main
    from .cut import Cutter
  File "/usr/local/lib/python3.8/dist-packages/autocut/cut.py", line 6, in <module>
    from moviepy import editor
  File "/usr/local/lib/python3.8/dist-packages/moviepy/editor.py", line 26, in <module>
    imageio.plugins.ffmpeg.download()
  File "/usr/local/lib/python3.8/dist-packages/imageio/plugins/ffmpeg.py", line 37, in download
    raise RuntimeError(
RuntimeError: imageio.ffmpeg.download() has been deprecated. Use 'pip install imageio-ffmpeg' instead.'

根据 StackOverflow 中的方法，只需要对 imageio 进行降版本，如下所示：

!pip install imageio==2.4.1

结果展示

!autocut -c inputs 九转大肠最全前因后果（补档）.mp4 九转大肠最全前因后果（补档）.srt

[autocut:cut.py:L110] INFO   Cut 九转大肠最全前因后果（补档）.mp4 based on 九转大肠最全前因后果（补档）.srt
[autocut:cut.py:L143] INFO   Reduced duration from 796.3 to 567.8
[MoviePy] >>>> Building video 九转大肠最全前因后果（补档）_cut.mp4
[MoviePy] Writing audio in 九转大肠最全前因后果（补档）_cutTEMP_MPY_wvf_snd.mp4
100% 12520/12520 [00:43<00:00, 288.93it/s]
[MoviePy] Done.
[MoviePy] Writing video 九转大肠最全前因后果（补档）_cut.mp4
100% 17030/17034 [03:18<00:00, 80.45it/s][autocut:warnings.py:L109] WARNING /usr/local/lib/python3.8/dist-packages/moviepy/video/io/ffmpeg_reader.py:123: UserWarning: Warning: in file 九转大肠最全前因后果（补档）.mp4, 552960 bytes wanted but 0 bytes read,at frame 23885/23888, at time 796.17/796.26 sec. Using the last valid frame instead.
  warnings.warn("Warning: in file %s, "%(self.filename)+

[autocut:warnings.py:L109] WARNING /usr/local/lib/python3.8/dist-packages/moviepy/video/io/ffmpeg_reader.py:123: UserWarning: Warning: in file 九转大肠最全前因后果（补档）.mp4, 552960 bytes wanted but 0 bytes read,at frame 23886/23888, at time 796.20/796.26 sec. Using the last valid frame instead.
  warnings.warn("Warning: in file %s, "%(self.filename)+

[autocut:warnings.py:L109] WARNING /usr/local/lib/python3.8/dist-packages/moviepy/video/io/ffmpeg_reader.py:123: UserWarning: Warning: in file 九转大肠最全前因后果（补档）.mp4, 552960 bytes wanted but 0 bytes read,at frame 23887/23888, at time 796.23/796.26 sec. Using the last valid frame instead.
  warnings.warn("Warning: in file %s, "%(self.filename)+

100% 17034/17034 [03:18<00:00, 85.76it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: 九转大肠最全前因后果（补档）_cut.mp4 

[autocut:cut.py:L167] INFO   Saved media to 九转大肠最全前因后果（补档）_cut.mp4

faster-whisper

参考：
github: faster-whisper

MoviePy 文档
srt 文档
github: srt
基于 Whisper 的开源应用程序：buzz

参考

文档信息

本文作者：Bookstall
本文链接：https://bookstall.github.io/2023/02/06/whisper-and-autocut/
版权声明：自由转载-非商用-非衍生-保持署名（创意共享3.0许可证）

Bookstall

AutoCut：基于 Whisper 的视频剪辑工具

OpenAI’s Whisper

安装

支持的模型和语言

Python 用法

AutoCut：通过字幕来剪切视频

安装

Transcribe（转录）

使用方法

关键代码

结果展示

Cut

使用方法

关键代码

问题

结果展示

faster-whisper

更多

参考

文档信息

Search

Table of Contents