ElevenLabs and its proprietary speech-to-text model

The ability to accurately transcribe speech into text has become a cornerstone for numerous applications, from content creation to accessibility solutions. ElevenLabs, a company renowned for its advancements in AI-driven voice technologies, has recently unveiled its proprietary speech-to-text model, Scribe. This development promises to set new benchmarks in transcription accuracy and efficiency.

Elevating Transcription Standards

Launched in February 2025, Scribe is designed to convert spoken language into written text with unparalleled precision. Supporting 99 languages, including those often underserved in AI applications like Serbian, Cantonese, and Malayalam, Scribe aims to make high-quality transcription universally accessible. This inclusivity ensures that users across diverse linguistic backgrounds can benefit from its capabilities.

Key Features of Scribe

Industry-Leading Accuracy: Benchmark tests, such as FLEURS and Common Voice, have demonstrated Scribe’s superior performance, achieving word error rates as low as 3.4% in English.
Speaker Diarisation: Scribe can distinguish and label up to 32 individual speakers within a single audio file, ensuring clarity in multi-participant recordings.
Word-Level Timestamps: Each transcribed word is accompanied by precise timing information, facilitating seamless integration with multimedia content and aiding in tasks like subtitle synchronisation.
Non-Speech Event Tagging: Beyond words, Scribe identifies and labels non-verbal sounds such as laughter, applause, and background noises, providing a comprehensive understanding of the audio context.

Developers can seamlessly incorporate Scribe into their applications through ElevenLabs’ Speech-to-Text API, which delivers structured JSON transcripts enriched with detailed metadata. For individual users and businesses, Scribe is accessible via the ElevenLabs dashboard, allowing for straightforward uploading of audio or video files to obtain formatted transcripts. elevenlabs.io

Pricing and Future Developments

Scribe is competitively priced at $0.40 per hour of input audio, with an introductory 50% discount available for the initial six weeks post-launch. Recognising the demand for real-time applications, ElevenLabs has announced plans to release a low-latency version of Scribe in the near future, further expanding its utility across various use cases. VentureBeat

The advent of advanced speech-to-text and voice cloning technologies brings forth ethical considerations, particularly concerning potential misuse in creating deepfakes or unauthorised voice replicas. ElevenLabs is acutely aware of these challenges and has implemented robust safeguards. The company monitors usage patterns to detect and prevent malicious activities, ensuring that their technology serves beneficial purposes without compromising security. Time

ElevenLabs’ introduction of Scribe marks a significant milestone in the realm of speech-to-text technologies. By combining exceptional accuracy with comprehensive language support and user-friendly integration, Scribe is poised to become an invaluable tool for developers, content creators, and enterprises alike. As AI continues to advance, innovations like Scribe exemplify how technology can bridge communication gaps and enhance accessibility on a global scale.

North Atlantic

Victor A. Lausas

Elevating Transcription Standards

Pricing and Future Developments

More from North Atlantic