Cut AI Model Costs by 60% with Smart Routing
When every request hits your most capable — and most expensive — model, costs compound fast. Most workloads are a mix: simple classification or formatting tasks that a fast $0.10/M-token model can handle, alongside complex reasoning that genuinely needs a top-tier model.
Mycelis smart routing lets you configure rules that decide which model handles each request. Your application sends all calls to the same endpoint with the same agent slug — the gateway handles the routing transparently.
How it works
Each Mycelis agent has a routing configuration. Requests are evaluated top-to-bottom against your rules. The first matching rule determines which underlying model receives the request. A fallback model handles anything that doesn't match.
Example: keyword-based routing
This is the simplest approach. Add keywords that signal a "simple" task to a rule that routes to a fast, cheap model:
- Open your agent in the Mycelis dashboard
- Go to Routing in the agent settings
- Add a rule:
- Match: Request contains any of —
summarize,classify,format,translate,extract,list - Route to: Fast model (e.g.
gpt-4o-mini,claude-haiku-4-5, or a self-hosted open-source model)
- Match: Request contains any of —
- Set the fallback to your high-capability model (e.g.
claude-sonnet-4-6,gpt-4o)
Now, a request like "Classify this support ticket as bug, feature request, or question" routes to the cheap model. A complex analytical prompt hits the capable model.
Example: cost-optimized round-robin
For workloads where all tasks are roughly similar in complexity but you want to minimize average cost:
- Add multiple models to your agent's model list
- Set the routing strategy to Cost-Optimized
- Mycelis will prefer the cheapest model that returns a response within your latency threshold
This works well for embedding generation, batch summarization, and content tagging pipelines.
Measuring the impact
After deploying routing rules, check Workspace → Usage → Cost Breakdown:
- Filter by agent
- Compare cost per request before and after (use the date range selector)
- The Model Distribution chart shows what percentage of requests went to each model
A well-tuned routing config typically saves 40–70% on token costs for mixed workloads.
Combining with budget controls
To prevent unexpected spikes, add a budget cap on top of routing:
- Go to Workspace Settings → Budget
- Set a monthly limit in credits
- Choose the behavior when the limit is hit: block requests, route to fallback, or alert admin
This gives you both cost optimization (routing) and cost protection (budget cap) in the same workspace.
Tips
- Start permissive. Add a broad catch-all rule first and monitor which requests match. Tighten the routing rules based on real traffic patterns.
- Log model selection. Enable request logging in the agent settings to see which model handled each request — useful for debugging unexpected routing behavior.
- Test before deploying. Use the Test Prompt panel in the agent settings to verify that a sample prompt routes to the model you expect.