Platform
Private AI infrastructure — from compute to agents.
Target groups
For enterprise, SMBs, and individual developers.
Knowledge & Support
Everything you need to succeed with Mycelis.
Intelligence
Define rules — Mycelis automatically chooses the most cost-effective matching model. Average 80% cost savings without changing a single line of code.
The VirtualModel concept
A VirtualModel is a named endpoint in your workspace — e.g. my-assistant. Instead of calling a model directly, you only provide this name. Mycelis evaluates your routing rules and decides for every request which deployment solves the task most cost-effectively.
Your code stays identical — only the VirtualModel deployment switches internally between models.
Smart routing rules
Requests with less than 500 input tokens → small model (e.g. Llama 8B). More than 500 tokens → high-performance model (e.g. GPT-4o).
Time-critical requests (stream=true, short prompts) → fastest available model. Background jobs → cheapest model.
Prompts containing 'code' or 'SQL' → specialized coding model. General questions → standard deployment.
70% of requests → model A, 30% → model B. For quality comparisons without code changes.
In typical production workloads, 60–80% of all requests are short, simple tasks (classification, extraction, short summaries). These can be handled by small, cost-effective models like Llama 3.1 8B — with the same output quality.
~890 €
GPT-4o only / month
~178 €
with smart routing / month
Code example — use a VirtualModel
from openai import OpenAI
client = OpenAI(
base_url="https://api.mycelis.io/proxy/v1",
api_key="pat_..."
)
# Only change the model parameter to your VirtualModel name
response = client.chat.completions.create(
model="my-assistant", # Mycelis routes automatically
messages=[{"role": "user", "content": "Summarize the contract."}]
)
# Routing: short request → Llama 3.1 8B (€0.39/h)
# instead of → GPT-4o (€0.005/1k tokens)Connect your knowledge bases directly to agents. For each request, Mycelis automatically searches relevant documents and adds them as context — without your own vector search implementation.
Frequently asked questions