Cost Optimization · 5 min min read

Cut AI Model Costs by 60% with Smart Routing

When every request hits your most capable — and most expensive — model, costs compound fast. Most workloads are a mix: simple classification or formatting tasks that a fast $0.10/M-token model can handle, alongside complex reasoning that genuinely needs a top-tier model.

Mycelis smart routing lets you configure rules that decide which model handles each request. Your application sends all calls to the same endpoint with the same agent slug — the gateway handles the routing transparently.

How it works

Each Mycelis agent has a routing configuration. Requests are evaluated top-to-bottom against your rules. The first matching rule determines which underlying model receives the request. A fallback model handles anything that doesn't match.

Example: keyword-based routing

This is the simplest approach. Add keywords that signal a "simple" task to a rule that routes to a fast, cheap model:

Open your agent in the Mycelis dashboard
Go to Routing in the agent settings
Add a rule:
- Match: Request contains any of — summarize, classify, format, translate, extract, list
- Route to: Fast model (e.g. gpt-4o-mini, claude-haiku-4-5, or a self-hosted open-source model)
Set the fallback to your high-capability model (e.g. claude-sonnet-4-6, gpt-4o)

Now, a request like "Classify this support ticket as bug, feature request, or question" routes to the cheap model. A complex analytical prompt hits the capable model.

Example: cost-optimized round-robin

For workloads where all tasks are roughly similar in complexity but you want to minimize average cost:

Add multiple models to your agent's model list
Set the routing strategy to Cost-Optimized
Mycelis will prefer the cheapest model that returns a response within your latency threshold

This works well for embedding generation, batch summarization, and content tagging pipelines.

Measuring the impact

After deploying routing rules, check Workspace → Usage → Cost Breakdown:

Filter by agent
Compare cost per request before and after (use the date range selector)
The Model Distribution chart shows what percentage of requests went to each model

A well-tuned routing config typically saves 40–70% on token costs for mixed workloads.

Combining with budget controls

To prevent unexpected spikes, add a budget cap on top of routing:

Go to Workspace Settings → Budget
Set a monthly limit in credits
Choose the behavior when the limit is hit: block requests, route to fallback, or alert admin

This gives you both cost optimization (routing) and cost protection (budget cap) in the same workspace.

Tips

Start permissive. Add a broad catch-all rule first and monitor which requests match. Tighten the routing rules based on real traffic patterns.
Log model selection. Enable request logging in the agent settings to see which model handled each request — useful for debugging unexpected routing behavior.
Test before deploying. Use the Test Prompt panel in the agent settings to verify that a sample prompt routes to the model you expect.

← Previous Build a RAG Agent with Mycelis Knowledge Bases Next → Drop-In Replace the OpenAI SDK with Mycelis

Products

Compute

Intelligence

Integration

Use Cases

Developers & Individuals

SMB

Enterprise

Resources

Learn

Community & Updates