UtterPad Download
01 / 06 — INDEX
For macOS

Subtitles, decoded on your machine.

A native macOS workshop for turning audio and video into SRT, VTT, TXT, ASS and JSON subtitles — fast, batch-friendly, with the engine you pick. No SaaS round-trips. No usage caps.

Download for macOS
02 / 06 — PILLARS

Three certainties.

P · 01

On-device by default.

Audio leaves your file only if you point UtterPad at a cloud model. The default pipeline is Whisper.cpp through Metal, or Apple's Speech framework. No network, no telemetry.

P · 02

Pro-tool surface.

Batch queue with per-row model, language, output-format and folder overrides. Drag-reorderable, drag-resizable columns. Phased progress bars that reset between extract and transcribe. Built for editors, not consumers.

P · 03

Every transcript, every format.

Each completed job auto-exports .srt, .vtt, .txt, .ass and .json — beside the source or to a folder you pick per file.

03 / 06 — PIPELINE

Four phases. Each one resets.

  1. step 01

    Extract

    AVAssetReader + AVAudioConverter decode any video or audio into 16 kHz mono Float32 — no ffmpeg, no GPL.

    extracting audio · 92%
  2. step 02

    Transcribe

    The selected engine receives the PCM. Whisper.cpp streams segments back via delegate; Apple Speech is internally chunked; cloud APIs return verbose JSON.

    transcribing · 47%
  3. step 03

    Live segments

    Every newly-finalized segment is appended to the transcript pane in real time. Real progress comes from lastSegment.endTime / total, not an estimate.

    transcribing · 71%
  4. step 04

    Export

    SRT, VTT, TXT, ASS, JSON written atomically to the configured destination. BOM-aware. Speaker tags optional.

    done · 100%
04 / 06 — ENGINE MATRIX

Pick the right tool for the recording.

Local · on-device

  • Whisper.cpp Tiny · Base · Small · Medium · Large v3 · Turbo GGUF downloaded from Hugging Face into Application Support. Metal accelerated.
  • Apple Speech on-device · macOS 14+ Native SFSpeechRecognizer. No download, no network. Chunked internally for long files.
  • Parakeet V2 (en) · V3 (en + 25 EU) NVIDIA Parakeet via Apple's FluidAudio SDK. CoreML accelerated, realtime-capable.

Cloud · bring your own key

  • GroqWhisper Large v3 Turbo~10× realtime; lowest latency cloud option.
  • ElevenLabsScribe v1 · v2Word-level timestamps, multilingual.
  • DeepgramNova-3 · Nova-3 MedicalSpeaker diarization out of the box.
  • MistralVoxtralVerbose JSON with segment timing.
  • Google Gemini2.5 / 3.1 · Pro · FlashAudio-understanding via generateContent.
  • Custom endpointOpenAI-compatiblePlug in any /v1/audio/transcriptions endpoint. API key in Keychain.
05 / 06 — PRIVACY

What leaves your machine?

audio
Stays on disk for local engines. Only uploaded if you explicitly pick a cloud model.
api keys
Stored in the macOS Keychain. Never written to disk in plaintext, never sent anywhere except the matching provider.
telemetry
None. No analytics, no crash reporters, no opt-in dialogs. The app does not phone home.
microphone
UtterPad does not request microphone access. It processes files you point it at.
12+
Models
5+
export formats
9+
transcription engines
14+
supported a/v containers
0
network requests by default
06 / 06 — DOWNLOAD

Get your subtitles ready.

Free. Native. On your machine.

Download UtterPad