Platform

Products

Private AI infrastructure — from compute to agents.

Target groups

Use Cases

For enterprise, SMBs, and individual developers.

Knowledge & Support

Resources

Everything you need to succeed with Mycelis.

Cost Optimization · 5 min min read

Cut AI Model Costs by 60% with Smart Routing

When every request hits your most capable — and most expensive — model, costs compound fast. Most workloads are a mix: simple classification or formatting tasks that a fast $0.10/M-token model can handle, alongside complex reasoning that genuinely needs a top-tier model.

Mycelis smart routing lets you configure rules that decide which model handles each request. Your application sends all calls to the same endpoint with the same agent slug — the gateway handles the routing transparently.

How it works

Each Mycelis agent has a routing configuration. Requests are evaluated top-to-bottom against your rules. The first matching rule determines which underlying model receives the request. A fallback model handles anything that doesn't match.

Example: keyword-based routing

This is the simplest approach. Add keywords that signal a "simple" task to a rule that routes to a fast, cheap model:

  1. Open your agent in the Mycelis dashboard
  2. Go to Routing in the agent settings
  3. Add a rule:
    • Match: Request contains any of — summarize, classify, format, translate, extract, list
    • Route to: Fast model (e.g. gpt-4o-mini, claude-haiku-4-5, or a self-hosted open-source model)
  4. Set the fallback to your high-capability model (e.g. claude-sonnet-4-6, gpt-4o)

Now, a request like "Classify this support ticket as bug, feature request, or question" routes to the cheap model. A complex analytical prompt hits the capable model.

Example: cost-optimized round-robin

For workloads where all tasks are roughly similar in complexity but you want to minimize average cost:

  1. Add multiple models to your agent's model list
  2. Set the routing strategy to Cost-Optimized
  3. Mycelis will prefer the cheapest model that returns a response within your latency threshold

This works well for embedding generation, batch summarization, and content tagging pipelines.

Measuring the impact

After deploying routing rules, check Workspace → Usage → Cost Breakdown:

  • Filter by agent
  • Compare cost per request before and after (use the date range selector)
  • The Model Distribution chart shows what percentage of requests went to each model

A well-tuned routing config typically saves 40–70% on token costs for mixed workloads.

Combining with budget controls

To prevent unexpected spikes, add a budget cap on top of routing:

  1. Go to Workspace Settings → Budget
  2. Set a monthly limit in credits
  3. Choose the behavior when the limit is hit: block requests, route to fallback, or alert admin

This gives you both cost optimization (routing) and cost protection (budget cap) in the same workspace.

Tips

  • Start permissive. Add a broad catch-all rule first and monitor which requests match. Tighten the routing rules based on real traffic patterns.
  • Log model selection. Enable request logging in the agent settings to see which model handled each request — useful for debugging unexpected routing behavior.
  • Test before deploying. Use the Test Prompt panel in the agent settings to verify that a sample prompt routes to the model you expect.