feat: add Voicebox as open-source ElevenLabs alternative

Free, local-first voice synthesis studio (MIT license) powered by Qwen3-TTS.
Voice cloning from short samples, local REST API, no per-character costs.
4-5x faster on Apple Silicon via MLX.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Corey Haines 2026-02-18 10:23:33 -08:00
parent f2a755f750
commit d4e6ef796c

View file

@ -381,6 +381,46 @@ Ultra-low latency voice generation built for real-time applications.
--- ---
### Voicebox (Open Source)
Free, local-first voice synthesis studio powered by Qwen3-TTS. The open-source alternative to ElevenLabs.
**Best for:** Free voice cloning, local/private generation, zero-cost batch production
**API:** Local REST API at `http://localhost:8000`
**Pricing:** Free (MIT license). Runs entirely on your machine.
**Stack:** Tauri (Rust) + React + FastAPI (Python)
**Capabilities:**
- Voice cloning from short audio samples via Qwen3-TTS
- Multi-language support (English, Chinese, more planned)
- Multi-track timeline editor for composing conversations
- 4-5x faster inference on Apple Silicon via MLX Metal acceleration
- Local REST API for programmatic generation
- No cloud dependency — all processing on-device
**Ad creative use cases:**
- Free voice cloning for brand spokesperson across all ad variations
- Batch generate voiceovers without per-character costs
- Private/local generation when ad content is sensitive or pre-launch
- Prototype voice variations before committing to a paid service
**API example:**
```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"text": "Stop wasting hours on manual reporting.", "profile_id": "abc123", "language": "en"}'
```
**Install:** Desktop apps for macOS and Windows at [voicebox.sh](https://voicebox.sh), or build from source:
```bash
git clone https://github.com/jamiepine/voicebox.git
cd voicebox && make setup && make dev
```
**Docs:** [GitHub](https://github.com/jamiepine/voicebox)
---
### Other Voice Tools ### Other Voice Tools
| Tool | Best For | Differentiator | API | | Tool | Best For | Differentiator | API |
@ -405,6 +445,7 @@ Ultra-low latency voice generation built for real-time applications.
| **PlayHT** | Very good | Yes | 140+ | <300ms | ~$0.10-0.20 | | **PlayHT** | Very good | Yes | 140+ | <300ms | ~$0.10-0.20 |
| **Fish Audio** | Good | Yes | 13+ | ~200ms | ~$0.05-0.10 | | **Fish Audio** | Good | Yes | 13+ | ~200ms | ~$0.05-0.10 |
| **WellSaid** | Very good | No (actor voices) | English | ~300ms | Custom pricing | | **WellSaid** | Very good | No (actor voices) | English | ~300ms | Custom pricing |
| **Voicebox** | Good | Yes (local) | 2+ | Local | Free (open source) |
### Choosing a Voice Tool ### Choosing a Voice Tool
@ -417,6 +458,8 @@ Need voiceover for ads?
├── Need multilingual (same ad, many languages)? ├── Need multilingual (same ad, many languages)?
│ ├── Most languages → PlayHT (140+) │ ├── Most languages → PlayHT (140+)
│ └── Best quality → ElevenLabs (29+) │ └── Best quality → ElevenLabs (29+)
├── Need free / open source / local?
│ └── Voicebox (MIT, runs on your machine)
├── Need cheap, fast, good-enough? ├── Need cheap, fast, good-enough?
│ └── OpenAI TTS ($0.015/min) │ └── OpenAI TTS ($0.015/min)
├── Need commercially-safe licensing? ├── Need commercially-safe licensing?