Hosted Models & AI APIs

The KDKP hosts a range of open-source models behind standardized, OpenAI-compatible APIs so that cantonal departments can integrate them in a consistent way.

Getting Started

All KDKP Large Language Models, embeddings, and rerankers are exposed via OpenAI-compatible endpoints. We provide templates, credential setup, and notes on cantonal network truststore integration to help developers connect.

TIP

API Starter Code Repository Client libraries, connection scripts, and usage examples for Python and other runtimes are available in the public DCC-BS kdkp-api-starter-code repository.

How We Select Models

Before we deploy a model, we evaluate it against a fixed set of criteria. The goal is to pick models that fit the hardware, the use case, and the available context budget — not just the highest-ranked model on a leaderboard.

We ask the following questions for every candidate model:

Infrastructure fit — Can the model run on the available KDKP infrastructure?
Modalities — Which modalities (text, image, video, audio) does it need to process or generate?
Efficiency — How efficient is the model? (tokens generated relative to benchmark results)
Context size — How long can the input texts be at most?
Speed — How fast is the model on the given hardware?
Benchmark quality — How well does the model score in benchmarks?
Censorship & bias — Is the model censored or biased, and is that relevant for the specific use case?

As external reference points for benchmark and efficiency comparisons we use the Artificial Analysis leaderboards and the OpenRouter leaderboard. These are inputs to the decision, not the decision itself — final selection always depends on the criteria above and our own tests on KDKP hardware.

Model Status

Each model below is marked with a status:

🟢 Active — in production use.
🟡 Deactivated — temporarily turned off, may return.
⚪ Retired — no longer hosted.

Model Catalog

Models are grouped by function: Large Language Models (LLM), text embeddings and rerankers, audio processing, Optical Character Recognition (OCR), and document parsing.

1. Large Language Models (LLM)

Gemma 4 31B — 🟢 Active

General-purpose LLM from Google. Supports reasoning, tool-use, and vision (image/video) input.

Hugging Face Model: RedHatAI/gemma-4-31B-it-NVFP4
Applications: TextMate, BS-Übersetzer, RAG Bot, Berichtgenerator.

Qwen 3 32B — 🟡 Deactivated

Reasoning LLM from Alibaba with a large context window and strong multilingual support. Scheduled for retirement in favor of Gemma 4.

Hugging Face Model: Qwen/Qwen3-32B-AWQ
Applications: TextMate, BS-Übersetzer, Berichtgenerator, RAG Bot, Berichtgenerator.

Qwen 3.5 27B — 🟡 Deactivated

Multimodal reasoning model from Alibaba. Temporarily deactivated due to high token generation overhead during reasoning tasks.

Hugging Face Model: Qwen/Qwen3.5-27B-FP8
Applications: Under evaluation.

Gemma 3 27B — ⚪ Retired

Vision-language model from Google. Previously used for experimental visual tasks.

Hugging Face Model: ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g

Llama 3.3 70B — ⚪ Retired

Generative text model from Meta AI.

Hugging Face Model: cortecs/Llama-3.3-70B-Instruct-FP8-Dynamic

2. Embeddings & Rerankers

Embeddings and rerankers power the retrieval pipelines behind semantic search and RAG assistants.

Qwen 3 0.6B Embedding — 🟢 Active

Lightweight text embedding model that turns documents and queries into dense vectors.

Hugging Face Model: Qwen/Qwen3-Embedding-0.6B-GGUF
Applications: RAG Bot.

Qwen 3 0.6B Reranker — 🟢 Active

Lightweight cross-encoder that re-scores and re-orders search results.

Hugging Face Model: Qwen/Qwen3-Reranker-0.6B
Applications: RAG Bot.

3. Audio Processing (Speech-to-Text)

FasterWhisper — 🟢 Active

Speech-to-text runtime for transcription and speaker indexing.

Description: OpenAI's Whisper model optimized with CTranslate2, combined with PyAnnote diarization for speaker indexing.
Hugging Face Model: openai/whisper-large-v2 (Transcription) and pyannote/speaker-diarization-community-1 (Speaker Diarization)
Applications: Transcribo, Kantonspolizei Berichtgenerator, DCC Berichtgenerator.

4. Optical Character Recognition (OCR)

GLM-OCR — 🟢 Active

OCR engine for multi-lingual document layouts; extracts text from scanned images and forms.

Hugging Face Model: zai-org/GLM-OCR
Applications: RPA (Robotic Process Automation) workloads in the GD department, StatA survey questionnaires.

Dots.ocr — ⚪ Retired

Alternative OCR engine.

Hugging Face Model: rednote-hilab/dots.ocr

5. Document Processing

Docling — 🟢 Active

Layout-aware parser that converts PDF, DOCX, and PPTX files into Markdown or JSON.

Official Documentation: Docling Core
Applications: TextMate, BS-Übersetzer, OGD GR Geschäfte.

Composables

Components

Hosted Models & AI APIs

Getting Started

How We Select Models

Model Status

Model Catalog

1. Large Language Models (LLM)

Gemma 4 31B — 🟢 Active

Qwen 3 32B — 🟡 Deactivated

Qwen 3.5 27B — 🟡 Deactivated

Gemma 3 27B — ⚪ Retired

Llama 3.3 70B — ⚪ Retired

2. Embeddings & Rerankers

Qwen 3 0.6B Embedding — 🟢 Active

Qwen 3 0.6B Reranker — 🟢 Active

3. Audio Processing (Speech-to-Text)

FasterWhisper — 🟢 Active

4. Optical Character Recognition (OCR)

GLM-OCR — 🟢 Active

Dots.ocr — ⚪ Retired

5. Document Processing

Docling — 🟢 Active

Hosted Models & AI APIs ​

Getting Started ​

How We Select Models ​

Model Status ​

Model Catalog ​

1. Large Language Models (LLM) ​

Gemma 4 31B — 🟢 Active ​

Qwen 3 32B — 🟡 Deactivated ​

Qwen 3.5 27B — 🟡 Deactivated ​

Gemma 3 27B — ⚪ Retired ​

Llama 3.3 70B — ⚪ Retired ​

2. Embeddings & Rerankers ​

Qwen 3 0.6B Embedding — 🟢 Active ​

Qwen 3 0.6B Reranker — 🟢 Active ​

3. Audio Processing (Speech-to-Text) ​

FasterWhisper — 🟢 Active ​

4. Optical Character Recognition (OCR) ​

GLM-OCR — 🟢 Active ​

Dots.ocr — ⚪ Retired ​

5. Document Processing ​

Docling — 🟢 Active ​

Hosted Models & AI APIs

Getting Started

How We Select Models

Model Status

Model Catalog

1. Large Language Models (LLM)

Gemma 4 31B — 🟢 Active

Qwen 3 32B — 🟡 Deactivated

Qwen 3.5 27B — 🟡 Deactivated

Gemma 3 27B — ⚪ Retired

Llama 3.3 70B — ⚪ Retired

2. Embeddings & Rerankers

Qwen 3 0.6B Embedding — 🟢 Active

Qwen 3 0.6B Reranker — 🟢 Active

3. Audio Processing (Speech-to-Text)

FasterWhisper — 🟢 Active

4. Optical Character Recognition (OCR)

GLM-OCR — 🟢 Active

Dots.ocr — ⚪ Retired

5. Document Processing

Docling — 🟢 Active