On-device by default.
Audio leaves your file only if you point UtterPad at a cloud model. The default pipeline is Whisper.cpp through Metal, or Apple's Speech framework. No network, no telemetry.
A native macOS workshop for turning audio and video into SRT, VTT, TXT, ASS and JSON subtitles — fast, batch-friendly, with the engine you pick. No SaaS round-trips. No usage caps.
Audio leaves your file only if you point UtterPad at a cloud model. The default pipeline is Whisper.cpp through Metal, or Apple's Speech framework. No network, no telemetry.
Batch queue with per-row model, language, output-format and folder overrides. Drag-reorderable, drag-resizable columns. Phased progress bars that reset between extract and transcribe. Built for editors, not consumers.
Each completed job auto-exports .srt, .vtt,
.txt, .ass and .json — beside the source
or to a folder you pick per file.
AVAssetReader + AVAudioConverter decode any video or
audio into 16 kHz mono Float32 — no ffmpeg, no GPL.
The selected engine receives the PCM. Whisper.cpp streams segments back via delegate; Apple Speech is internally chunked; cloud APIs return verbose JSON.
Every newly-finalized segment is appended to the transcript pane in real time.
Real progress comes from lastSegment.endTime / total, not an
estimate.
SRT, VTT, TXT, ASS, JSON written atomically to the configured destination. BOM-aware. Speaker tags optional.
SFSpeechRecognizer. No download, no network. Chunked internally for long files.
generateContent./v1/audio/transcriptions endpoint. API key in Keychain.Free. Native. On your machine.
Download UtterPad