AI on Demandpowered by Mistral AI

Voxtral Mini by Mistral AI: 3 billion parameters that combine speech recognition and speech understanding in a single model. Transcription and content analysis in 8 languages — powered by stepping stone on Swiss infrastructure.

Voxtral Mini by Mistral AI combines speech recognition and natural language understanding in a single model. It transcribes audio, answers questions about the content and generates structured summaries — without the need for separate systems for transcription and analysis.

With just 3 billion parameters, Voxtral Mini is particularly resource-efficient. It processes audio files of up to 30 minutes for transcription and 40 minutes for content analysis, in 8 languages: German, English, French, Spanish, Portuguese, Italian, Dutch and Hindi. stepping stone runs Voxtral Mini entirely on Swiss infrastructure — your audio data remains in Switzerland.

Parameters that combine speech recognition and natural language understanding in a single model. Transcription and content analysis in 8 languages — operated by stepping stone on Swiss infrastructure.

Companies and organisations that wish to process audio content efficiently — without transferring data to US-based providers. Particularly suitable for multilingual teams and regulated sectors where the confidentiality of conversations and recordings is crucial.

Typical use cases: transcription of meetings, interviews and customer conversations; automated summaries of audio conferences; analysis and evaluation of voice recordings; and voice-controlled workflows with function calling directly from audio.

Open source (Apache 2.0). European model. Swiss data centres. No data stored with US providers.

Audio and speech recognition in a single step — no separate transcription tool required. Automatic speech recognition in 8 languages. Particularly efficient and cost-effective with just 3 billion parameters. Personalised advice and support provided by stepping stone in Bern.

Scope of services

On-demand audio AI

Access to Voxtral Mini for the transcription, analysis and summarisation of audio files. Up to 40 minutes per file, in 8 languages with automatic speech recognition.

GPU performance on demand

Scalable computing power for individual recordings or large audio archives. Thanks to its compact design, it’s particularly cost-effective — you pay as you go.

Managed service

Deployment, monitoring, maintenance and support on Swiss infrastructure, with personalised advice. stepping stone takes care of the day-to-day running so that you can focus on the benefits.

Areas of application

Meetings & Minutes

Voxtral Mini transcribes and analyses recorded conversations in a single step — no separate transcription tool is required.

Teams use it for automated meeting minutes, summaries of customer conversations and interview analysis. Available in 8 languages with automatic speech recognition, for audio files up to 40 minutes long.

Media processing

With Voxtral Mini, audio content can not only be transcribed, but also analysed for its content and processed in a structured manner.

Companies and media producers use it for subtitling, multilingual processing and voice-controlled workflows with function calling directly from audio. Compact, cost-effective and based entirely on Swiss infrastructure.

Price

ModelMTok
Voxtral-Mini-3B-25070.0002
All prices are in CHF/MTok, excluding VAT.