Intelligence

Agents & Smart Routing —
One slug, all models.

Define rules — Mycelis automatically chooses the most cost-effective matching model. Average 80% cost savings without changing a single line of code.

Start for free Documentation

The VirtualModel concept

A VirtualModel is a named endpoint in your workspace — e.g. my-assistant. Instead of calling a model directly, you only provide this name. Mycelis evaluates your routing rules and decides for every request which deployment solves the task most cost-effectively.

Your code stays identical — only the VirtualModel deployment switches internally between models.

Smart routing rules

Token budget

Requests with less than 500 input tokens → small model (e.g. Llama 8B). More than 500 tokens → high-performance model (e.g. GPT-4o).

Latency optimization

Time-critical requests (stream=true, short prompts) → fastest available model. Background jobs → cheapest model.

Keyword matching

Prompts containing 'code' or 'SQL' → specialized coding model. General questions → standard deployment.

A/B Routing

70% of requests → model A, 30% → model B. For quality comparisons without code changes.

Average 80% cost savings

In typical production workloads, 60–80% of all requests are short, simple tasks (classification, extraction, short summaries). These can be handled by small, cost-effective models like Llama 3.1 8B — with the same output quality.

~890 €

GPT-4o only / month

~178 €

with smart routing / month

Code example — use a VirtualModel

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.mycelis.io/proxy/v1",
    api_key="pat_..."
)

# Only change the model parameter to your VirtualModel name
response = client.chat.completions.create(
    model="my-assistant",  # Mycelis routes automatically
    messages=[{"role": "user", "content": "Summarize the contract."}]
)
# Routing: short request → Llama 3.1 8B (€0.39/h)
# instead of → GPT-4o (€0.005/1k tokens)

Direct RAG integration

Connect your knowledge bases directly to agents. For each request, Mycelis automatically searches relevant documents and adds them as context — without your own vector search implementation.

Frequently asked questions

What is a VirtualModel?

A VirtualModel is a named endpoint (e.g. 'my-assistant') that internally forwards requests to different models based on configured rules. Your code stays unchanged.

Can I define custom routing rules?

Yes. In the dashboard you define rules based on token count, latency requirements, prompt keywords, or A/B split percentages.

Are there extra costs for smart routing?

No. Smart routing is included in every deployment. You only pay for the model resources actually used (GPU hours or tokens).

Does RAG work with all models?

Yes. RAG documents are injected as context into the prompt — this works with all models that support chat completions.

Products

Compute

Intelligence

Integration

Use Cases

Enterprise

SMB

Developers & Individuals

Resources

Learn

Community & Updates

Support

Agents & Smart Routing —
One slug, all models.

Token budget

Latency optimization

Keyword matching

A/B Routing

Average 80% cost savings

Direct RAG integration

80% cheaper — same quality.

Products

Compute

Intelligence

Integration

Use Cases

Enterprise

SMB

Developers & Individuals

Resources

Learn

Community & Updates

Support

Agents & Smart Routing —One slug, all models.

Token budget

Latency optimization

Keyword matching

A/B Routing

Average 80% cost savings

Direct RAG integration

80% cheaper — same quality.

Agents & Smart Routing —
One slug, all models.