Products
Products

From the cloud to support – everything from a single source, tailored to your needs.

Discover now
Onboarding
Onboarding

A secure transition to the cloud. Step by step. Our seven onboarding steps provide guidance, minimise risks and lay a solid foundation.

Discover now
About us
About us

A Swiss cloud partner with a strong ethos.

Discover now
- Team
- Careers
- Infrastructure
- Contact

$TYPO3\CMS\Extbase\Domain\Model\FileReference:390$

AI on Demandpowered by MiniMaxAI

MiniMax-M2.5: a mixture-of-experts model with 230 billion parameters — of which only 10 billion are active per query. Trained on over 200,000 real-world development environments and more than 10 programming languages, and operated by stepping stone on Swiss infrastructure.

MiniMax-M2.5 is an open-weight language model based on a Mixture-of-Experts (MoE) architecture. Of its 230 billion parameters, only around 10 billion are active per query — this makes the model fast and cost-effective without compromising on performance.

The model has been specifically trained for coding and agent-based tasks: across over 200,000 real-world development environments and more than 10 programming languages. It plans like a software architect, breaks down complex tasks into manageable steps and works independently with tools. stepping stone runs MiniMax-M2.5 on Swiss infrastructure — no data leaves the country.

Development teams and companies wishing to use AI to support programming, automation or the creation of agent-based workflows — without relying on US providers. Particularly suitable for organisations that require cutting-edge performance whilst keeping an eye on costs and data sovereignty.

Typical use cases: AI-powered software development and code reviews; agent-based workflows with tool integration and web search; automated document generation (Word, Excel, PowerPoint); complex, multi-stage tasks with autonomous planning.

Open Weights (Modified MIT). Swiss data centres. No vendor lock-in.

Thanks to its MoE architecture, MiniMax-M2.5 delivers cutting-edge performance at a fraction of the usual cost — 10 to 20 times cheaper than comparable proprietary models. The model supports over 10 programming languages and works independently with tools, search engines and files. Personalised advice and operation provided by stepping stone in Bern.

Your Consultant

Yannick Denzer

CDO & System Engineer

+41 77 450 53 58

+41 31 332 53 63

yannick.denzer@stepping-stone.ch

Make Appointment

Scope of services

AI model on demand

Access to MiniMax-M2.5 for coding, agent-based workflows and complex text tasks. Frontier-level performance on a par with the best proprietary models — at a fraction of the cost.

GPU performance on demand

Scalable computing power for Swiss infrastructure. Particularly efficient thanks to MoE architecture: 230 billion parameters, with only 10 billion active per query.

Managed service

Deployment, monitoring, maintenance and support on Swiss infrastructure, with personalised advice. stepping stone takes care of the day-to-day running so that you can focus on the benefits.

Areas of application

Software development

MiniMax-M2.5 has been specifically trained for professional software development — in more than 10 programming languages and real-world production environments.

It generates, analyses and reviews code at a cutting-edge level: 10 to 20 times cheaper than comparable proprietary models. Teams use it as a coding assistant, for automated code reviews and to speed up development cycles.

Agent-based workflows

The model plans like a software architect: it breaks down complex tasks, works independently with tools and delivers structured results.

MiniMax-M2.5 supports function calling, web searches and file operations — ideal for multi-step automation. It automatically creates Word, Excel and PowerPoint documents and is suitable for agent-based set-ups that do not rely on US cloud services.

Benchmark

The benchmarks were measured using the vllm bench tool against the production API gateway. The standard input sizes were 1,024 tokens for input and 256 tokens for output, which corresponds to 2–3 book pages or 500–750 words.

If necessary, higher input sizes can be set.

Call

# Set your personal key:
STONEY_KEY=sk-...

# Make key visible for vllm bench:
export OPENAI_API_KEY=$STONEY_KEY

# Start the benchmark
vllm bench serve \
 --backend openai-chat \
 --model "MiniMaxAI/MiniMax-M2.5" \
 --base-url llm.stoney-cloud.com \
 --endpoint /v1/chat/completions \
 --dataset-name random \
 --random-input-len 1024 \
 --random-output-len 256 \
 --num-prompts 50 \
 --max-concurrency 1 \
--tokenizer "Qwen/Qwen2.5-7B-Instruct" \
 --percentile-metrics ttft

Result

============ Serving Benchmark Result ============
Successful requests:                     48        
Failed requests:                         2         
Maximum request concurrency:             1         
Benchmark duration (s):                  187.20    
Total input tokens:                      73622     
Total generated tokens:                  12288     
Request throughput (req/s):              0.26      
Output token throughput (tok/s):         65.64     
Peak output token throughput (tok/s):    257.00    
Peak concurrent requests:                2.00      
Total token throughput (tok/s):          458.92    
---------------Time to First Token----------------
Mean TTFT (ms):                          3740.68   
Median TTFT (ms):                        3756.77   
P99 TTFT (ms):                           3871.80   
==================================================

Legend

Successful requests: Successful prompt requests
Failed requests: Unsuccessful prompts
Maximum request concurrency: How many requests the model processes simultaneously.
Benchmark duration (s): The duration of the benchmark run in seconds.
Total input tokens: The total number of input tokens.
Total generated tokens: The total number of tokens generated by the model.
Request throughput (req/s): The number of requests processed per second.
Output token throughput (tok/s): The average number of tokens generated per second.
Peak output token throughput (tok/s): The maximum measured number of output tokens per second.
Peak concurrent requests: The maximum measured number of requests processed simultaneously.
Total token throughput (tok/s): The average of all tokens processed during the measurement.
Mean Time to First Token (TTFT) (ms): The average time elapsed between input and the first visible output.
Median TTFT (ms): The expected time between input and the first visible output. Also known as TTFT p50.
p99 TTFT (ms): The time elapsed in the “worst case” scenario until the first token is generated.
Tokenizer: The tokenizer is used to send queries to the evaluated model during a benchmark. These are typically small, publicly available models, such as Qwen/Qwen2.5-7B-Instruct.

Price

Model	Context length	Input/MTok	Output/MTok
MiniMax-M2.5	196k	1.9400	9.7000

All prices are in CHF/MTok, excluding VAT.

AI on Demandpowered by MiniMaxAI

Your Consultant

Yannick Denzer

Scope of services

AI model on demand

GPU performance on demand

Managed service

Areas of application

Software development

Agent-based workflows

Benchmark

Price

Product enquiry

Conditions:

AI on Demandpowered by MiniMaxAI

Product description

Areas of application

Advantages

Yannick Denzer

AI model on demand

GPU performance on demand

Managed service

Software development

Agent-based workflows

Price

Generic model

Product enquiry

Conditions: