Enterprise-grade closed caption translation and natural voice dubbing. Upload media, get perfectly timed translations and AI-cloned voice dubs in minutes.
The Problem
Localization pipelines at studios like Disney, Netflix, and Amazon involve 5+ fragmented tools, weeks of turnaround, and expensive manual linguist work for isometric dubbing — rewriting translations to match original speech timing. Voice casting alone can take days. The result: content launches in 1-2 languages and takes months to reach global audiences.
How Echō Works
01
Upload
Drop in any video or audio file. Echō accepts MP4, MOV, MKV, WAV, MP3, and more.
02
Transcribe & Translate
ElevenLabs Scribe extracts speech with timestamps and speaker identification. Claude translates with cultural adaptation and timing constraints.
03
Dub & Export
ElevenLabs clones the original voice and synthesizes the translation. Export captions or dubbed audio.
The Technology Stack
ElevenLabs Scribe
High-accuracy speech-to-text with word-level timestamps, speaker diarization, and 99+ language support. Powers the transcription pipeline.
Claude (Anthropic)
Context-aware translation that preserves idioms, tone, and cultural nuance. Handles isometric adaptation — rewriting translated text to fit original timing windows.
ElevenLabs
Voice cloning from audio samples. Multilingual synthesis with prosody, emotion, and pacing control that sounds natural.
Isometric Dubbing
The secret sauce. Claude rewrites translated text to match the duration of each original speech segment — the same thing Disney pays linguists to do manually.
What Disney Could Do Better
Studios currently use 5+ separate tools for transcription, translation agencies, voice casting, recording studios, and manual QC. Echō consolidates everything into two APIs — Claude for all intelligence (transcription, translation, adaptation) and ElevenLabs for voice synthesis. Voice cloning eliminates casting and recording for 80% of use cases, isometric adaptation replaces weeks of manual linguist work, and real-time preview eliminates the back-and-forth between translation and audio teams.
MP3, WAV, AAC — individual segments or full mixed track
Languages
English ↔ Spanish (launch). More languages coming.
Step-by-Step Guide
Follow these steps to go from raw media to fully translated captions and natural-sounding dubbed audio.
0Configure Your API Keys
Echō connects to two AI services. Enter your keys in the sub-bar at the top of the page. Each key lights up green when configured.
Claude
Powers translation and isometric adaptation — rewriting translations to match original speech timing. Get your key at console.anthropic.com. Uses Claude Sonnet for fast, high-quality results.
ElevenLabs
Powers voice cloning and speech synthesis. Get your key at elevenlabs.io/app/settings/api-keys. Free tier includes limited characters; paid plans unlock more.
Your keys are stored in your browser's localStorage only. They are never sent to BuilderBias servers — each key is sent directly to its respective API (Anthropic or ElevenLabs) over HTTPS.
1Upload Your Media
Drag and drop a video or audio file onto the upload zone, or click to browse your files. Echō accepts all major formats:
Once uploaded, you'll see the file name, size, and type. Select your source language and target language from the dropdowns. Currently supports English ↔ Spanish with more languages coming.
2Transcribe with ElevenLabs Scribe
Click "Start Transcription" to send your audio to ElevenLabs Scribe. Scribe will:
Extract all spoken words from the audio track with high accuracy
Generate precise start and end timestamps for each word and segment
Identify different speakers (speaker diarization)
Detect the source language automatically if set to "Auto-detect"
When complete, you'll see the full transcript in a table with timecodes, speaker labels, and the original text. A stats bar shows total segments, duration, speakers detected, and word count. Video files have their audio automatically extracted before transcription.
Tip: For best results, use audio with minimal background noise. Scribe handles accents and multiple speakers well, but heavy music or sound effects can reduce accuracy.
3Translate & Isometric Adaptation
Click "Translate & Adapt" to send all segments to Claude. This is a two-part process:
Translation
Claude translates each segment with cultural context — preserving idioms, humor, tone, and register rather than doing a word-for-word literal translation. A joke stays funny, a formal address stays formal.
Adaptation
Claude then rewrites the translation to fit the original segment's time window. This is isometric dubbing — the same process Disney pays linguists to do manually. If a 3-second English phrase translates to a 5-second Spanish phrase, Claude finds a shorter way to say it that still sounds natural.
The transcript table updates with translations shown in green and adapted versions in amber. The "Fit" column shows whether the adapted text fits the timing window — OK means it fits, a percentage shows how much longer it runs.
4Generate Voice Dub
Click "Generate Voice Dub" to synthesize the translated audio using ElevenLabs. The process:
Voice selection — Echō selects a multilingual voice from your ElevenLabs account (or uses a cloned voice if available)
Segment-by-segment synthesis — Each adapted text segment is synthesized individually for precise timing control
Prosody matching — ElevenLabs' Multilingual v2 model preserves natural speech patterns, emotion, and pacing
Progress tracking — Watch each segment go from "Pending" to "Processing" to "Done" in real-time
This step is optional — if you only need translated captions, click "Skip to Export" to jump ahead.
Tip: For the best voice cloning results, upload a voice sample to ElevenLabs first (elevenlabs.io/voice-cloning). Echō will automatically use your cloned voice for synthesis, making the dubbed audio sound like the original speaker.
5Preview Your Results
The preview player appears after translation completes. Use it to review your work before exporting:
Original
Play back with original-language captions overlaid on the waveform timeline.
Dubbed
Play back with translated/adapted captions. If voice dubbing is complete, hear the synthesized audio.
Side by Side
See original and translated captions simultaneously — ideal for QC review and comparing translations.
Use the scrubber to jump to any point in the timeline. Captions update in real-time as you scrub through the waveform.
6Export
Click any export card to download your translated content. Available formats:
SRT
The universal subtitle format. Works with VLC, Premiere Pro, DaVinci Resolve, Final Cut, and virtually every video player and editor.
WebVTT
Web-native format with CSS styling support. Ideal for HTML5 video players, web apps, and streaming platforms.
SBV
YouTube's native caption format. Upload directly to YouTube Studio for instant localized captions.
TTML / DFXP
Broadcast-grade XML format used by Netflix, Disney+, and broadcast networks. Required for many content delivery platforms.
JSON
Full pipeline data including original text, translations, adapted text, timing, and speaker info. Use this for API integrations or custom workflows.
Dubbed Audio
Download the AI-generated dubbed audio track (available after voice dubbing is complete). Ready to mix with original video in your editor.
Pro Tips
⚡
Caption-Only Workflow
If you only need translated captions (no voice dubbing), skip step 4 entirely. After translation, go straight to export. Both API keys are still needed — ElevenLabs for transcription and Claude for translation.
🎤
Better Voice Cloning
For the most natural dubbing, create a custom voice clone in ElevenLabs first using a clean sample of the original speaker. Echō will use it automatically.
📊
Timing Fit Indicators
Watch the "Fit" column after translation. Green "OK" means the adapted text fits the timing. A percentage like "+15%" means it runs slightly long — the voice synthesis will speak slightly faster to compensate, but you may want to manually shorten the text for the most natural result.
🔒
Security
Your API keys and media files never touch BuilderBias servers. All API calls go directly from your browser to Anthropic and ElevenLabs over encrypted HTTPS connections. Keys are saved in localStorage so you don't have to re-enter them.
1 Upload
→
2 Transcribe
→
3 Translate
→
4 Dub
→
5 Export
🎬
Drop your media file here
or click to browse — video, audio, or existing caption files