Generative AI

ElevenLabs Launches v3: Most Expressive Text-to-Speech Model Yet

ElevenLabs unveils v3 (alpha), its most expressive TTS model to date, supporting 70+ languages, emotional cues, dialogue mode, and next-level speech realism.

CIOL Bureau

06 Jun 2025 09:37 IST

New Update

ElevenLabs has announced the launch of Eleven v3, the alpha version of its next-generation text-to-speech model, bringing a breakthrough in vocal realism and emotional expressiveness. Designed not just to read, but to perform, Eleven v3 sets a new standard for AI-generated speech by responding naturally to non-verbal prompts, shifting tone mid-sentence, and seamlessly switching between characters and moods.

Advertisment

Unprecedented Expressiveness with New Architecture

Built on a brand-new architecture, Eleven v3 introduces capabilities that were previously considered out of reach for synthetic speech. Users can now guide performance using audio tags such as [whispers], [angry], [laughs], and even sound prompts like [door creaks], enabling nuanced storytelling and immersive experiences.

This alpha release requires more prompt engineering than previous models, but the reward is a leap in realism, control, and humanlike delivery. It enables speech synthesis that mirrors the rhythm and emotional depth of real human conversation.

Advertisment

Expanding Global Reach with 70+ Languages

ElevenLabs has also expanded its multilingual support from 33 to over 70 languages, increasing global population coverage from 60% to 90%. This aligns with the company’s mission of making high-quality synthetic speech universally accessible.

Key Features of Eleven v3 (Alpha):

Advertisment

70+ Languages Supported: Significantly expanded from 33, covering 90% of the world’s population.
Dialogue Mode: Handles interruptions, emotional transitions, and natural flow across multiple speakers.
Audio Tags: Guide tone and style with tags like [whispers], [laughs], [sad], or [cheerful].
Streaming Support: Coming soon—ideal for real-time agents and call center applications.
Public API: API access for Eleven v3 will be available soon; interested developers and enterprises can contact sales for early access.

Who It’s For

Eleven v3 is tailored for creators, developers, and enterprises building expressive content—ranging from audiobooks, character-driven storytelling, and video game dialogue to educational tools and voiceovers. While v3 is more suited for expressive, pre-rendered applications, real-time low-latency needs are still best served by ElevenLabs’ v2.5 Turbo and Flash models—for now.

Advertisment

Why It Matters

With Eleven v3, AI speech moves from synthetic reading to emotional performance. The model enables deep character work, tone shifts, and dynamic interactions, which unlock new possibilities in immersive media and multilingual storytelling.

Mati Staniszewski, Co-Founder & CEO of ElevenLabs, summed up the release by saying, “Eleven v3 is the most expressive text-to-speech model ever—offering full control over emotions, delivery, and nonverbal cues. With audio tags, you can prompt it to whisper, laugh, change accents, or even sing. You can control the pacing, emotion, and style to match any script. And with our global mission, we are happy to extend the model with support for over 70 languages."

Advertisment

"This release is the result of the vision and leadership of my co-founder Piotr, and the incredible research team he’s built. Creating a good product is hard—creating an entirely new paradigm is almost impossible. I, and all of us at ElevenLabs, feel lucky to witness the magic this team brings to life—and with this release, we're excited to push the frontier once again.”

#GenAI

GenAI