Cut Claude and GPT costs by ~60% automatically.

Mycelis routes every request to the cheapest model capable of solving it — local GPUs, DeepSeek, Claude, GPT and Gemini behind one OpenAI-compatible endpoint.

Gemma / Qwen
simple tasks · cheapest
DeepSeek R1
coding · reasoning
Claude / GPT-4o
complex only · on escalation

~60%

lower API cost

< 60s

to first request

0

code changes needed

Drop-in replacement

Change one line. Spend ~60% less on API costs.

Point your existing OpenAI SDK at Mycelis. Routing, cost optimization, and model selection happen automatically.

agent.py one line change
# before — direct OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# after — Mycelis (one line changed)
client = OpenAI(
  base_url="https://mycelis.ai/api/proxy/v1",
  api_key="my-pat-key"
)

response = client.chat.completions.create(
  model="mycelis/coding-agent",
  messages=[{"role": "user", "content": "..."}]
)

Unified Agent Runtime Architecture

Your Application OpenAI SDK · Cursor · Continue.dev Mycelis Proxy https://mycelis.ai/api/proxy/v1 Smart Router IF · Regex · Tool-Call · Auto GPU Instances DeepSeek · Gemma 4 · Qwen · Custom MCP Tools · RAG · Fine-Tuning Managed Keys Claude 4.7 · GPT-4o · Gemini BYOK · OpenWebUI
~60% lower API cost on mixed workloads
< 60 second deployment
OpenAI-compatible endpoint
Local + cloud hybrid routing
Zero vendor lock-in

Cost-aware routing

Simple tasks are cheap.
Complex tasks escalate automatically.

Every request is evaluated against your routing rules. Only when complexity, context length, or tool requirements demand it does Mycelis escalate to a frontier model.

REQUEST → MODEL ROUTING

autocomplete / fill-in
short context · pattern matching
Gemma 4 26B
$0.002 / 1k tok
medium coding task
refactor · unit tests · explain
DeepSeek V3
$0.006 / 1k tok
complex reasoning
architecture · long context · debug
Claude Sonnet
escalated
research / multi-step planning
tool-use heavy · very long context
Claude / GPT-4o
escalated

One endpoint, transparent logic

Your app always calls the same URL. Routing decisions are logged and visible — no black box.

Rules you control

Define routing by token count, keywords, tool calls, or regex. Or use the auto-router and let Mycelis decide.

Quality is preserved

Complex tasks still go to the best model. You never trade quality for cost — only unnecessary spend is cut.

62% less

Typical cost reduction for a mixed coding workload (80% routine tasks, 20% complex).

Comparison

Why not just use LiteLLM?

LiteLLM is a great library. Mycelis is a managed platform. Here's what you get beyond what you'd build yourself.

Feature Mycelis LiteLLM DIY / CCR
Dedicated GPU provisioning
Managed frontier API keys
Local + cloud hybrid routing partial partial
Hosted OpenWebUI (chat UI)
One unified OpenAI-compatible endpoint
Cost-aware auto routing limited limited
MCP tool integration
Knowledge bases / RAG
Zero infrastructure to manage

Getting started

You don't need to configure everything.

Start with a single coding agent preset. Advanced routing, MCP tools, and RAG can be added later — when you actually need them.

01

Connect your client

Point any OpenAI-compatible tool at mycelis.ai/api/proxy/v1. Works with Cursor, Claude Code, OpenCode, or any OpenAI SDK.

02

Use default routing

The default coding agent preset already routes simple tasks to cheap models and escalates automatically. Costs drop immediately, no configuration needed.

03

Optimize when ready

Once you see your routing traces and cost breakdown, tune rules, add MCP tools, attach knowledge bases, or fine-tune a local model.

When you need more

Expandable power. Not mandatory complexity.

These features are ready when your use case demands them. Start simple, add layers as you grow.

Dedicated GPU Instances

Run DeepSeek, Gemma, Qwen or custom models on your own GPU — fully private, low latency, billed per minute.

MCP Tool Integration

Attach tools to your agents — web search, file access, APIs — without changing your client code.

Knowledge Bases / RAG

Upload docs, codebases, or PDFs. Mycelis retrieves relevant context before each inference call automatically.

Fine-Tuning / LoRA

Train domain-specific adapters on your data. Deploy a fine-tuned model that routes to the right base automatically.

Hosted OpenWebUI

One-click chat interface for your workspace. Share with teammates, auto-billed per minute of usage.

Semantic Cache

Skip identical or near-identical requests. Cache semantically equivalent prompts to cut costs further.

Privacy-first routing

Keep sensitive code local.

Route private workloads to self-hosted models. Use Claude or GPT only when necessary — and only for requests that don't contain sensitive IP.

Classify by sensitivity

Route requests with proprietary code patterns to local GPU instances. External providers never see that data.

EU-hosted infrastructure

All Mycelis infrastructure runs in the EU. Your data stays in jurisdiction without any enterprise contract.

BYOK or managed keys

Use your own Claude/OpenAI API keys or let Mycelis manage them. No lock-in, your choice.

PRIVACY ROUTING EXAMPLE

contains internal codebase pattern → local GPU
generic question, no IP → DeepSeek
matches /INTERNAL keyword → local GPU

Rules defined by you. Evaluated per request, before any external call is made.

Transparent Pricing

Only pay for what you use.

No base fee. No minimum term. Transparent billing for all services.

Open Source Models

Scalable & Dynamic

Deploy models like DeepSeek or Gemma 4. Scale up or down instantly. Billed per hour of runtime.

Commercial Models

0% Markup

Managed or BYOK. Connect Claude 4.7, GPT-4o or Gemini with 0% markup on provider prices.

Knowledge Bases

$0.05 / h + $0.01 / GB / h

Secure RAG for your data. Fixed base price plus storage-based billing.

OpenWebUI

$0.05 / h

The world's best interface for your team. Fully managed and connected to all your models.

Real workload example

100k coding tokens / day · mixed complexity

Claude / GPT-4o only
no routing, all premium models
~$420/mo
Mycelis smart routing
65% Gemma 20% DeepSeek 15% Claude
~$169/mo

Same workflow. Same endpoint. Lower cost.

~60%

cost reduction

Calculated from the workload example on the left: $420 → $169 per month.

Common Questions

Everything you need to know.

Your AI. Your data. Your costs.

Start for free with Managed Keys — or deploy your first open-source model in under 60 seconds. No base fee.

UnhandledError Reload 🗙