Tutorial
"Demo: Multi-model coding agent with automatic fallback"
May 15, 2025 · 12 min min read
In this tutorial you will build a production-ready coding agent from three models: a self-hosted open-source model, a commercial top-tier model, and a cost-efficient fallback. Routing is fully automatic — no custom code required.
What we are building
- Gemma 4 as a self-hosted deployment (zero per-token cost)
- Claude Opus 4.6 via BYOK for complex tasks
- DeepSeek-V3 via BYOK as a cheap mid-tier fallback
- A virtual model that bundles all three
- An agent with rule-based routing and Smart Dispatcher
- OpenCode using the Mycelis proxy as its backend
Step 1: Deploy Gemma 4
Go to Compute → Deployments → New Deployment.
Select gemma-4 (or the available Gemma 4 variant in your cluster) as the model. Give the deployment a clear name like gemma4-coding-local. Start the deployment — it runs on your own GPU and produces no variable token costs.
Tip: Gemma 4 excels at autocomplete, short explanations, and lightweight refactoring. Roughly 60–65% of typical coding requests fall into this category.
Step 2: Deploy Claude Opus 4.6 (BYOK)
Go to Compute → Deployments → New Deployment → BYOK.
Select Anthropic as the provider and claude-opus-4-6 as the model. Enter your Anthropic API key and name the deployment claude-opus-coding. Save.
Reserve Claude Opus 4.6 for architecture decisions, complex debugging with stack traces, and deep reasoning tasks.
Step 3: Deploy DeepSeek-V3 (BYOK)
Go to Compute → Deployments → New Deployment → BYOK again.
Select DeepSeek as the provider and deepseek-chat (DeepSeek-V3) as the model. Enter your DeepSeek API key, name: deepseek-v3-coding. Save.
DeepSeek-V3 costs a fraction of Claude and handles standard coding work — bug fixes, unit tests, mid-complexity refactoring — reliably.
Step 4: Create a virtual model
Go to Models → New Virtual Model.
- Name:
coding-agent - Slug:
coding-agent(used later in the OpenCode config) - Add deployments: all three —
gemma4-coding-local,claude-opus-coding,deepseek-v3-coding
A virtual model bundles multiple deployments behind a stable slug. Clients always target the same endpoint; routing happens transparently underneath.
Step 5: Create an agent and choose a strategy
Go to Agents → New Agent.
- Name:
Multi-Model Coding Agent - Virtual model:
coding-agent - Strategy:
Rule-based
The rule-based strategy evaluates a priority-ordered list of conditions for every request and routes to the matching deployment. If no rule matches, the Smart Dispatcher steps in as a fallback.
Step 6: Configure routing rules
Under Routing Rules in the agent, add the following three rules in order — priority matters:
Rule 1 – Simple tasks to Gemma 4
| Field | Value |
|---|---|
| Condition | Keywords contain: autocomplete, explain, comment, rename, snippet |
| Target deployment | gemma4-coding-local |
| Priority | 1 (highest) |
Rule 2 – Complex tasks to Claude Opus 4.6
| Field | Value |
|---|---|
| Condition | Keywords contain: architecture, design, stacktrace, debug, migration, performance, security OR estimated tokens > 4000 |
| Target deployment | claude-opus-coding |
| Priority | 2 |
Rule 3 – Standard coding to DeepSeek (default)
| Field | Value |
|---|---|
| Condition | Always true (default fallback rule) |
| Target deployment | deepseek-v3-coding |
| Priority | 3 (lowest) |
Smart Dispatcher as a safety net: When no rule matches — for example because all deployments are temporarily unavailable or the rule logic leaves a gap — the Smart Dispatcher analyzes the request and selects the most cost-efficient available deployment automatically.
Step 7: Create an API key in Mycelis
Go to Settings → API Keys → New API Key.
- Name:
opencode-local - Permissions: Inference (minimum)
- Click Create and copy the generated key — it is shown only once.
This key authorizes OpenCode to send requests through your Mycelis workspace.
Step 8: Configure OpenCode with the Mycelis proxy
Open your OpenCode configuration file (~/.config/opencode/config.json or opencode.json in your project root).
Add a new provider entry:
{
"providers": {
"mycelis": {
"name": "Mycelis",
"apiKey": "mc_your_api_key_here",
"baseURL": "https://mycelis.ai/api/proxy/v1"
}
},
"model": "mycelis/coding-agent"
}
Replace mc_your_api_key_here with the key from step 7 and coding-agent with your virtual model's slug.
Restart OpenCode. All requests now flow through Mycelis, and routing decides in the background which of the three models responds.
Result
You now have a coding agent that:
- Answers simple requests for free on your own GPU (Gemma 4)
- Forwards complex architecture questions to Claude Opus 4.6
- Sends everything else to DeepSeek-V3 at a fraction of Claude's cost
- Falls back to the Smart Dispatcher when no rule fires, automatically picking the cheapest suitable model
- Logs every routing decision in the Dashboard under Smart Routing Insights
For a typical coding workload this setup saves 60–70% of API costs compared to a single-model setup, with no compromise on output quality.