ElevenLabs is launching its own speech-to-text model – TechCrunch

Latest

Amazon

Apps

Biotech & Health

Climate

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

Meta

Microsoft

Privacy

Robotics

Security

Social

Space

Startups

TikTok

Transportation

Venture

Events

Startup Battlefield

StrictlyVC

Newsletters

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Contact Us
ElevenLabs, an AI startup that just raised a $180 million mega-funding round, has been primarily known for its audio-generation prowess. The company took a step in another technological direction by launching its first stand-alone speech-to-text model called Scribe.The startup, valued at $3.3 billion, has aided many other companies in providing text-to-speech services through its vast library of voices. However, the company is now looking to get into speech detection and compete with the likes of Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI’s Whisper models.ElevenLabs’ Scribe model supports over 99 languages at launch. The company categorizes over 25 languages in excellent accuracy category for the model where the word error rate is less than 5%. This list includes English (claimed accuracy rate of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese. Other languages are ranked in different categories with high (5% to 10% word error rate), good (10% to 20% word error rate), and moderate (25% to 50%) word error rates.The company said that the model outperformed Google Gemini 2.0 Flash and Whisper Large V3 across multiple languages in FLEURS & Common Voice benchmark tests.ElevenLabs had developed the speech-to-text component for its AI conversational agent platform, which was released last year. However, this is the first time the company is releasing a stand-alone speech detection model. In a conversation with TechCrunch last month, CEO Mati Staniszewski talked about improving speech detection models.“We want to understand what’s being said by you in a conversation better. We are working on ways to move away from only generating content and understanding and transcribing speech,” Staniszewski said at that time. “Many people say that speech-to-text is a solved problem. But for many languages, it is pretty bad. We think we can build better speech detection models because we have in-house teams to annotate data and give us quick feedback.”The model also has smart speaker diarization to tell you who is speaking, timestamp at word level for accurate subtitles, and auto-tagging sound events like audience laughters. The startup is providing a way for customers to directly transcribe video content to add subtitles or captions in its studio.Scribe currently only works with pre-recorded audio formats. The company said it will release a low-latency real-time version of the model soon. That means it is not yet effective for meeting transcriptions or voice note-taking.ElevenLabs is pricing Scribe at $0.40 for an hour of transcribed audio. While the rate is competitive, some of its rivals offer a lower price for audio transcriptions at the moment with some feature differentiation.Topics Meta fires around 20 employees for leaking confidential information
OpenAI CEO Sam Altman says the company is ‘out of GPUs’
OpenAI unveils GPT-4.5 ‘Orion,’ its largest AI model yet
Here are all the tech companies rolling back DEI or still committed to it — so far
Amazon Alexa+ costs $19.99, free for Prime members
Thousands of exposed GitHub repositories, now private, can still be accessed through Copilot
Y Combinator deletes posts after a startup’s demo goes viral
Subscribe for the industry’s biggest tech newsEvery weekday and Sunday, you can get the best of TechCrunch’s coverage.TechCrunch’s AI experts cover the latest news in the fast-moving field.Every Monday, gets you up to speed on the latest advances in aerospace.Startups are the core of TechCrunch, so get our best coverage delivered weekly.By submitting your email, you agree to our Terms and Privacy Notice.© 2024 Yahoo.

Source: https://techcrunch.com/2025/02/26/elevenlabs-is-launching-its-own-speech-to-text-model/

ElevenLabs is launching its own speech-to-text model – TechCrunch

More Stories

Google now lets you delete personal info directly from Search – here’s how – ZDNet

Warner Bros. Games Shuts Down Three Studios, Including ‘Multiversus’ Developer – Cartoon Brew

Lonestar and Phison’s data center infrastructure is headed to the moon – TechCrunch

Leave a Reply Cancel reply

Google now lets you delete personal info directly from Search – here’s how – ZDNet

Warner Bros. Games Shuts Down Three Studios, Including ‘Multiversus’ Developer – Cartoon Brew

Lonestar and Phison’s data center infrastructure is headed to the moon – TechCrunch

The red color of Mars might have an earlier, wetter origin – The Register

More Stories

Google now lets you delete personal info directly from Search – here’s how – ZDNet

Warner Bros. Games Shuts Down Three Studios, Including ‘Multiversus’ Developer – Cartoon Brew

Lonestar and Phison’s data center infrastructure is headed to the moon – TechCrunch

Leave a Reply Cancel reply

You may have missed

Google now lets you delete personal info directly from Search – here’s how – ZDNet

Warner Bros. Games Shuts Down Three Studios, Including ‘Multiversus’ Developer – Cartoon Brew

Lonestar and Phison’s data center infrastructure is headed to the moon – TechCrunch

The red color of Mars might have an earlier, wetter origin – The Register