AI on Demand powered by OpenAI

Whisper Large V3 by OpenAI: the world’s most widely used open-source model for automatic speech recognition, trained on over 5 million hours of audio data. Supports around 100 languages, and performs reliably with accents and technical jargon — powered by stepping stone on Swiss infrastructure.

OpenAI’s Whisper Large V3 is the world’s most widely used model for automatic speech recognition. It transcribes spoken language into text — reliably, in around 100 languages, and with support for timestamps at word and sentence level. In addition, Whisper translates spoken language directly into English.

The model has been trained on over 5 million hours of audio data and is particularly robust against background noise, accents and technical jargon. stepping stone runs Whisper Large V3 entirely on Swiss infrastructure — your audio data remains in Switzerland.

Companies and organisations that wish to convert audio content into text — without transferring data to US cloud services. Particularly suitable for multilingual environments and wherever accurate transcription on a large scale is required.

Typical use cases: transcription of meetings, interviews and customer conversations; subtitling of videos and media files; automatic translation of spoken content into English; and accessibility through speech-to-text in applications and platforms.

Open source (Apache 2.0). Swiss data centres. No data stored with US providers.

The world’s most widely used ASR model — tried and tested in thousands of production environments. Around 100 languages, robust against background noise and accents. Timestamps at word and sentence level for precise alignment. Personalised advice and operation provided by stepping stone in Bern.

Scope of services

On-demand speech recognition

Access to Whisper Large V3 for transcribing and translating audio files. Around 100 languages with automatic speech recognition and optional timestamps at word or sentence level.

GPU performance on demand

Scalable computing power for individual recordings or large audio archives. From single transcriptions to bulk processing — you pay as you go.

Managed service

Deployment, monitoring, maintenance and support on Swiss infrastructure, with personalised advice. stepping stone takes care of the day-to-day running so that you can focus on the benefits.

Areas of application

Transcription

Whisper Large V3 is the industry standard for automatic speech recognition — tried and tested in thousands of production environments worldwide.

Teams use it to transcribe meetings, interviews, customer conversations and telephone calls. With timestamps at word and sentence level, content can be referenced precisely and fed into downstream processes.

Subtitling & Translation

Whisper reliably transcribes across around 100 languages — even when there is background noise, accents or technical jargon.

Media producers and platform operators use it for subtitling, automatic translation of spoken content into English, and accessible speech-to-text solutions. All of this is hosted on Swiss infrastructure, without any data being transferred to US services.

Price

ModelMTok
whisper-large-v30.0020
All prices are in CHF/MTok, excluding VAT.