Voice-Pro: The Open-Source ElevenLabs Alternative That Runs Entirely on Your Machine
If you’ve ever wanted to take a YouTube video, clone the speaker’s voice, and dub it into another language — all without sending a single byte to the cloud — Voice-Pro is exactly that tool.
Recently open-sourced and made completely free, Voice-Pro is a Gradio-based web application that chains together every step of the multilingual dubbing pipeline into a single local interface. Think of it as an ElevenLabs alternative that respects your privacy and your wallet.
The End-to-End Dubbing Pipeline
What makes Voice-Pro compelling isn’t any single feature — it’s the complete pipeline running locally:
- Download — Grab any YouTube video via yt-dlp
- Separate — Isolate vocals from background audio using Demucs
- Transcribe — Convert speech to text with Whisper, Faster-Whisper, or WhisperX
- Translate — Translate the transcript into 100+ languages via Deep-Translator
- Clone & Dub — Re-synthesize the translated text in the original speaker’s voice using zero-shot voice cloning
Each step feeds into the next through the Gradio UI. No copy-pasting between tools, no API keys to manage, no cloud round-trips.
Zero-Shot Voice Cloning with F5-TTS and CosyVoice
The voice cloning component is where Voice-Pro gets interesting. It supports three models for zero-shot cloning:
- F5-TTS — A flow-matching based model that produces natural-sounding clones from short reference audio. It also supports fine-tuned models for even higher quality on specific voices.
- E2-TTS — An end-to-end approach that handles cloning with minimal preprocessing.
- CosyVoice — Alibaba’s multilingual voice cloning model, particularly strong for Chinese and cross-lingual synthesis.
“Zero-shot” means no training required. Give any of these models a few seconds of reference audio, and they’ll synthesize new speech in that voice. If you’re exploring other voice cloning options, LuxTTS achieves 150x realtime on just 1GB VRAM with a different approach.
For cases where cloning isn’t needed, Voice-Pro also includes Edge-TTS (Microsoft’s free neural TTS) and kokoro for standard multilingual text-to-speech.
Why Local Matters
Cloud TTS services like ElevenLabs charge per character, gate features behind subscription tiers, and process your audio on remote servers. Voice-Pro flips all of that:
- No API costs — Run unlimited generations. No per-character billing, no monthly caps.
- Complete privacy — Your audio, transcripts, and cloned voices never leave your machine. Critical for sensitive content, legal recordings, or proprietary material.
- No rate limits — Process a hundred videos back-to-back without hitting throttles.
- Offline capable — Once models are downloaded, the entire pipeline works without internet.
This aligns with the broader shift toward local voice AI tools that give creators full control over their data and workflow.
Technical Stack
Voice-Pro is built on:
- Python 3.10 with PyTorch 2.5 (CUDA support)
- Gradio 5 for the web interface
- Whisper/WhisperX for speech recognition (see also browser-based alternatives)
- Demucs for vocal separation
- Deep-Translator for multilingual translation
- Cross-platform: Windows, Mac, and Linux
Installation involves cloning the repo and running the setup script. Models download automatically on first use.
Who Is This For?
Voice-Pro hits a sweet spot for several use cases:
- Content creators dubbing their videos into new languages while preserving their voice
- Podcasters producing multilingual versions of episodes
- Researchers processing multilingual speech datasets locally
- Developers building voice pipelines who need an open-source foundation
The voice processing ecosystem has exploded with specialized tools, but Voice-Pro’s value is integration — one app that handles the full journey from source video to dubbed output.
The Catch
Voice-Pro requires decent hardware. Whisper and the voice cloning models are GPU-intensive, so you’ll want an NVIDIA card with at least 6GB VRAM for comfortable use. Mac users can run it on Apple Silicon, but performance will vary.
The Gradio interface is functional but not polished — this is an open-source tool built by a small team, not a SaaS product with a design department. That said, for a free tool that replaces hundreds of dollars in monthly API costs, it’s hard to complain.
Getting Started
git clone https://github.com/abus-aikorea/voice-pro.git
cd voice-pro
# Follow platform-specific setup instructions in the README
Voice-Pro represents what happens when the building blocks of voice AI — recognition, separation, cloning, synthesis — become good enough to chain together locally. The result is a pipeline that would have cost thousands in API calls just two years ago, now running for free on your own hardware.