Mycelis routes every request to the cheapest model capable of solving it — local GPUs, DeepSeek, Claude, GPT and Gemini behind one OpenAI-compatible endpoint.

Start Free See How It Works

Gemma / Qwen

simple tasks · cheapest

DeepSeek R1

coding · reasoning

Claude / GPT-4o

complex only · on escalation

~60%

lower API cost

< 60s

to first request

code changes needed

Drop-in replacement

Change one line. Spend ~60% less on API costs.

Point your existing OpenAI SDK at Mycelis. Routing, cost optimization, and model selection happen automatically.

agent.py one line change

# before — direct OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# after — Mycelis (one line changed)
client = OpenAI(
  base_url="https://mycelis.ai/api/proxy/v1",
  api_key="my-pat-key"
)

response = client.chat.completions.create(
  model="mycelis/coding-agent",
  messages=[{"role": "user", "content": "..."}]
)

Einheitliche Agent-Runtime-Architektur

~60% lower API cost on mixed workloads

< 60 second deployment

OpenAI-compatible endpoint

Local + cloud hybrid routing

Zero vendor lock-in

Cost-aware routing

Simple tasks are cheap.
Complex tasks escalate automatically.

Every request is evaluated against your routing rules. Only when complexity, context length, or tool requirements demand it does Mycelis escalate to a frontier model.

REQUEST → MODEL ROUTING

autocomplete / fill-in
short context · pattern matching
Gemma 4 26B
$0.002 / 1k tok
medium coding task
refactor · unit tests · explain
DeepSeek V3
$0.006 / 1k tok
complex reasoning
architecture · long context · debug
Claude Sonnet
escalated
research / multi-step planning
tool-use heavy · very long context
Claude / GPT-4o
escalated

One endpoint, transparent logic

Your app always calls the same URL. Routing decisions are logged and visible — no black box.

Rules you control

Define routing by token count, keywords, tool calls, or regex. Or use the auto-router and let Mycelis decide.

Quality is preserved

Complex tasks still go to the best model. You never trade quality for cost — only unnecessary spend is cut.

62% less

Typical cost reduction for a mixed coding workload (80% routine tasks, 20% complex).

Comparison

Why not just use LiteLLM?

LiteLLM is a great library. Mycelis is a managed platform. Here's what you get beyond what you'd build yourself.

Feature	LiteLLM	DIY / CCR
Dedicated GPU provisioning	—	—
Managed frontier API keys	—	—
Local + cloud hybrid routing	partial	partial
Hosted OpenWebUI (chat UI)	—	—
One unified OpenAI-compatible endpoint
Cost-aware auto routing	limited	limited
MCP tool integration	—	—
Knowledge bases / RAG	—	—
Zero infrastructure to manage	—	—

Getting started

You don't need to configure everything.

Start with a single coding agent preset. Advanced routing, MCP tools, and RAG can be added later — when you actually need them.

Connect your client

Point any OpenAI-compatible tool at mycelis.ai/api/proxy/v1. Works with Cursor, Claude Code, OpenCode, or any OpenAI SDK.

Use default routing

The default coding agent preset already routes simple tasks to cheap models and escalates automatically. Costs drop immediately, no configuration needed.

Optimize when ready

Once you see your routing traces and cost breakdown, tune rules, add MCP tools, attach knowledge bases, or fine-tune a local model.

Start Free Read the getting started guide

When you need more

Expandable power. Not mandatory complexity.

These features are ready when your use case demands them. Start simple, add layers as you grow.

Dedicated GPU Instances

Run DeepSeek, Gemma, Qwen or custom models on your own GPU — fully private, low latency, billed per minute.

MCP Tool Integration

Attach tools to your agents — web search, file access, APIs — without changing your client code.

Knowledge Bases / RAG

Upload docs, codebases, or PDFs. Mycelis retrieves relevant context before each inference call automatically.

Fine-Tuning / LoRA

Train domain-specific adapters on your data. Deploy a fine-tuned model that routes to the right base automatically.

Hosted OpenWebUI

One-click chat interface for your workspace. Share with teammates, auto-billed per minute of usage.

Semantic Cache

Skip identical or near-identical requests. Cache semantically equivalent prompts to cut costs further.

Privacy-first routing

Keep sensitive code local.

Route private workloads to self-hosted models. Use Claude or GPT only when necessary — and only for requests that don't contain sensitive IP.

Classify by sensitivity

Route requests with proprietary code patterns to local GPU instances. External providers never see that data.

EU-hosted infrastructure

All Mycelis infrastructure runs in the EU. Your data stays in jurisdiction without any enterprise contract.

BYOK or managed keys

Use your own Claude/OpenAI API keys or let Mycelis manage them. No lock-in, your choice.

PRIVACY ROUTING EXAMPLE
contains internal codebase pattern
                            → local GPU
generic question, no IP
                            → DeepSeek
matches /INTERNAL keyword
                            → local GPU
Rules defined by you. Evaluated per request, before any external call is made.

Transparente Preise

Zahlen Sie nur für das, was Sie nutzen.

Keine Grundgebühr. Keine Mindestlaufzeit. Transparente Abrechnung für alle Dienste.

Open Source Modelle

Skalierbar & Dynamisch

DeepSeek, Gemma 4 & mehr. Skalierung und Laufzeit im Wizard einstellbar. Stündliche Abrechnung.

Kommerzielle Modelle

0 % Aufschlag

Managed oder BYOK. Verbinde Claude 4.7, GPT-4o & Co. mit 0 % Aufschlag auf die Anbieterpreise.

Knowledge Bases

0,05 € / h + 0,01 € / GB / h

Sicheres RAG für deine Daten. Fester Basispreis plus speicherbasierte Abrechnung.

OpenWebUI

0,05 € / h

Das beste Interface für dein Team. Vollständig gemanagt und mit allen Modellen verbunden.

Real workload example

100k coding tokens / day · mixed complexity

Claude / GPT-4o only

no routing, all premium models

~$420/mo

Mycelis smart routing

65% Gemma 20% DeepSeek 15% Claude

~$169/mo

Same workflow. Same endpoint. Lower cost.

~60%

cost reduction

Calculated from the workload example on the left: $420 → $169 per month.

Vollständige Preisübersicht anzeigen

Häufige Fragen

Alles was Sie wissen müssen.

Deine KI. Deine Daten. Deine Kosten.

Starte kostenlos mit Managed Keys — oder deploye dein erstes Open-Source-Modell in unter 60 Sekunden. Keine Grundgebühr.

Kostenlos starten ContactSales

Produkte

Compute

Intelligence

Integration

Use Cases

Entwickler & Private

KMU

Enterprise

Ressourcen

Learn

Community & Updates

Support

Cut Claude and GPT costs by ~60% automatically.