Tutorial

"Demo: Multi-model coding agent with automatic fallback"

May 15, 2025 · 12 min min read

In this tutorial you will build a production-ready coding agent from three models: a self-hosted open-source model, a commercial top-tier model, and a cost-efficient fallback. Routing is fully automatic — no custom code required.

What we are building

Gemma 4 as a self-hosted deployment (zero per-token cost)
Claude Opus 4.6 via BYOK for complex tasks
DeepSeek-V3 via BYOK as a cheap mid-tier fallback
A virtual model that bundles all three
An agent with rule-based routing and Smart Dispatcher
OpenCode using the Mycelis proxy as its backend

Step 1: Deploy Gemma 4

Go to Compute → Deployments → New Deployment.

Select gemma-4 (or the available Gemma 4 variant in your cluster) as the model. Give the deployment a clear name like gemma4-coding-local. Start the deployment — it runs on your own GPU and produces no variable token costs.

Tip: Gemma 4 excels at autocomplete, short explanations, and lightweight refactoring. Roughly 60–65% of typical coding requests fall into this category.

Step 2: Deploy Claude Opus 4.6 (BYOK)

Go to Compute → Deployments → New Deployment → BYOK.

Select Anthropic as the provider and claude-opus-4-6 as the model. Enter your Anthropic API key and name the deployment claude-opus-coding. Save.

Reserve Claude Opus 4.6 for architecture decisions, complex debugging with stack traces, and deep reasoning tasks.

Step 3: Deploy DeepSeek-V3 (BYOK)

Go to Compute → Deployments → New Deployment → BYOK again.

Select DeepSeek as the provider and deepseek-chat (DeepSeek-V3) as the model. Enter your DeepSeek API key, name: deepseek-v3-coding. Save.

DeepSeek-V3 costs a fraction of Claude and handles standard coding work — bug fixes, unit tests, mid-complexity refactoring — reliably.

Step 4: Create a virtual model

Go to Models → New Virtual Model.

Name: coding-agent
Slug: coding-agent (used later in the OpenCode config)
Add deployments: all three — gemma4-coding-local, claude-opus-coding, deepseek-v3-coding

A virtual model bundles multiple deployments behind a stable slug. Clients always target the same endpoint; routing happens transparently underneath.

Step 5: Create an agent and choose a strategy

Go to Agents → New Agent.

Name: Multi-Model Coding Agent
Virtual model: coding-agent
Strategy: Rule-based

The rule-based strategy evaluates a priority-ordered list of conditions for every request and routes to the matching deployment. If no rule matches, the Smart Dispatcher steps in as a fallback.

Step 6: Configure routing rules

Under Routing Rules in the agent, add the following three rules in order — priority matters:

Rule 1 – Simple tasks to Gemma 4

Field	Value
Condition	Keywords contain: `autocomplete`, `explain`, `comment`, `rename`, `snippet`
Target deployment	`gemma4-coding-local`
Priority	1 (highest)

Rule 2 – Complex tasks to Claude Opus 4.6

Field	Value
Condition	Keywords contain: `architecture`, `design`, `stacktrace`, `debug`, `migration`, `performance`, `security` OR estimated tokens > 4000
Target deployment	`claude-opus-coding`
Priority	2

Rule 3 – Standard coding to DeepSeek (default)

Field	Value
Condition	Always true (default fallback rule)
Target deployment	`deepseek-v3-coding`
Priority	3 (lowest)

Smart Dispatcher as a safety net: When no rule matches — for example because all deployments are temporarily unavailable or the rule logic leaves a gap — the Smart Dispatcher analyzes the request and selects the most cost-efficient available deployment automatically.

Step 7: Create an API key in Mycelis

Go to Settings → API Keys → New API Key.

Name: opencode-local
Permissions: Inference (minimum)
Click Create and copy the generated key — it is shown only once.

This key authorizes OpenCode to send requests through your Mycelis workspace.

Step 8: Configure OpenCode with the Mycelis proxy

Open your OpenCode configuration file (~/.config/opencode/config.json or opencode.json in your project root).

Add a new provider entry:

{
  "providers": {
    "mycelis": {
      "name": "Mycelis",
      "apiKey": "mc_your_api_key_here",
      "baseURL": "https://mycelis.ai/api/proxy/v1"
    }
  },
  "model": "mycelis/coding-agent"
}

Replace mc_your_api_key_here with the key from step 7 and coding-agent with your virtual model's slug.

Restart OpenCode. All requests now flow through Mycelis, and routing decides in the background which of the three models responds.

Result

You now have a coding agent that:

Answers simple requests for free on your own GPU (Gemma 4)
Forwards complex architecture questions to Claude Opus 4.6
Sends everything else to DeepSeek-V3 at a fraction of Claude's cost
Falls back to the Smart Dispatcher when no rule fires, automatically picking the cheapest suitable model
Logs every routing decision in the Dashboard under Smart Routing Insights

For a typical coding workload this setup saves 60–70% of API costs compared to a single-model setup, with no compromise on output quality.

Back to overview

Products

Compute

Intelligence

Integration

Use Cases

Developers & Individuals

SMB

Enterprise

Resources

Learn

Community & Updates

Support