AI on Demand powered by OpenDataLab

MinerU2.5 by OpenDataLab: a specialised vision-language model with 1.2 billion parameters for two-step document parsing — layout analysis followed by content recognition at native resolution. Powered by stepping stone on Swiss infrastructure.

MinerU2.5 from OpenDataLab is a specialised vision-language model for document parsing. Unlike traditional OCR systems, it first analyses the entire page layout and then recognises content such as text, tables and equations in native resolution — in two separate steps to ensure maximum precision.

With just 1.2 billion parameters, MinerU2.5 is particularly resource-efficient. It delivers structured Markdown with correct mapping of all page elements: headings, lists, code blocks, references, headers and footers. The model also handles special cases such as rotated tables, borderless tables and complex mathematical formulae. stepping stone runs MinerU2.5 entirely on Swiss infrastructure — your documents remain in Switzerland.

Companies and organisations that wish not only to digitise documents but also to preserve their structure — without transferring data to US providers. Particularly suitable for document pipelines where layout is important: reports, academic papers, technical documentation.

Typical applications: structured extraction from PDF documents for RAG pipelines; processing of reports and studies with complex layouts; table and formula recognition in technical and academic documents; preparation of document collections for knowledge databases and archives.

Open source (AGPL-3.0). Swiss data centres. No data stored with US providers.

Not just text recognition, but genuine document parsing: MinerU2.5 understands the structure of a page and returns it as clean Markdown. Particularly effective with complex layouts — rotated tables, borderless tables, multilingual formulae. Efficient in operation with just 1.2 billion parameters. Can be integrated directly into RAG pipelines via LangChain and LlamaIndex. Personalised support and operation provided by stepping stone in Bern.

Scope of services

On-demand document parsing

Access to MinerU2.5 for the structured extraction of documents. Layout analysis and content recognition in two steps — from PDF to clean Markdown with correct element mapping.

GPU performance on demand

Scalable computing power for individual documents or entire archives. Particularly cost-effective thanks to the compact model — you pay as you go.

Betreuter Betrieb

Deployment, monitoring, maintenance and support on Swiss infrastructure, with personalised advice. stepping stone takes care of the day-to-day running so that you can focus on the benefits.

Areas of application

Structured extraction

MinerU2.5 understands the structure of a page — and returns it as clean, structured Markdown.

Tables, formulas, headers and footers, references and code blocks are correctly recognised and mapped. Even complex special cases such as rotated tables, borderless tables and multilingual formulas are processed reliably.

Knowledge databases

Documents that are in a structured format can be fed directly into knowledge databases and search systems.

MinerU2.5 output is natively compatible with LangChain and LlamaIndex — ideal for building RAG pipelines from existing PDF archives. Organisations use it to make reports, studies and technical documentation accessible for their AI applications.

Benchmark

The benchmark processes 50 CVs (100 pages total). Step-by-step instructions and the required Python script can be downloaded from GitHub.

If necessary, higher concurrency and page limits can be set.

 

Call

# Set your personal key:
STONEY_KEY=sk-...

# Make key visible for bench script:
export OPENAI_API_KEY=$STONEY_KEY

# Start the benchmark
python cv_bench_endpoint.py \
 --endpoint llm.stoney-cloud.com/v1/chat/completions \
 --data cv_bench_data \
 --model "MinerU2.5-2509-1.2B" \
 --api-key $STONEY_KEY \
 --concurrency 1 \
 --limit 100

 

Result

concurrency   : 1
requested     : 50
ok            : 50
failed        : 0
duration_s    : 136.42
pages_s       : 0.36
pages_min     : 23.2
out_tok_s     : 480.8
latency_p50_s : 1.54
latency_p99_s : 8.52

 

Legend

  • concurrency: How many requests the model processes simultaneously.
  • requested: How many requests were sent.
  • ok: Number of accepted requests (in this case, CVs).
  • failed: Number of rejected requests.
  • duration_s: The duration of the benchmark run.
  • pages_s: The average number of pages that can be processed per second.
  • pages_min: The average number of pages that can be processed per minute.
  • out_tok_s: The number of tokens generated per second.
  • latency_p50_s: The average response time in seconds.
  • latency_p99_s: The response time required in the “worst case” scenario, in seconds.

Price

ModelContext lengthInput/MTokOutput/MTok
MinerU2.5-2509-1.2B16k0.02000.0600
All prices are in CHF/MTok, excluding VAT.