DeepL, famous for text translation, now wants to translate your voice

DeepL, the translation company best known for its text tools, today launched a speech-to-speech translation suite that addresses use cases such as meetings, mobile and web conversations, and group conversations for frontline workers through a custom app. The company is also releasing an API that will allow external developers and enterprises to build on DeepL’s technology for custom use cases such as call centers.

“After spending years translating text, voice was a natural step for us,” DeepL CEO Jarek Kutylowski told TechCrunch in an interview. “We’ve made a lot of progress in text translation and document translation, but we didn’t think there was a great product for real-time speech translation.”

Kutylowski said that in creating a real-time translation product, it is difficult to strike a balance between reducing latency (the delay between speaking and playing the translated audio) and maintaining accurate results.

DeepL is releasing additional features for platforms like Zoom and Microsoft Teams. Listeners can hear real-time translation while others speak in their native language, or they can follow the live-translated text on their screen. The program is currently in early access, and the company is inviting organizations to join the waitlist. The company also has products for mobile and web-based conversations that can take place in person or remotely.

DeepL also allows users to join group conversations in settings such as training sessions or workshops, allowing participants to join via QR codes.

DeepL said its voice-to-voice technology can also learn and adapt to custom vocabularies, such as industry-specific terms and company and personal names.

Kutylowski said AI is reimagining what customer service will look like in the future. He noted that the translation layer helps companies provide support in languages ​​where qualified employees are scarce and expensive to hire.

Tech Crunch Event

San Francisco, California
|
October 13-15, 2026

The company said it controls the entire voice-to-voice stack. However, current systems convert speech to text, apply a translation, and then convert it back to speech. DeepL has been working on text translation for many years, so we believe it has the upper hand when it comes to translation quality. In the future, the company wants to develop an end-to-end speech translation model that completely skips the text step.

DeepL faces competition from several well-funded startups working in adjacent corners of the space. Sanas, which received $65 million in investment from Quadrille Capital and Teleperformance last year, uses AI to correct speakers’ accents in real time, a tool primarily aimed at call center agents.

Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment company Amazon Web Services, helping dub and localize large-scale video content.

Palabra, backed by Reddit co-founder Alexis Ohanian’s company Seven Seven Six, is building a real-time speech translation engine designed to preserve both meaning and the speaker’s original voice, putting it in more direct competition with the engine currently being built by DeepL.