Vibe Coding Router

A three-tier cascaded router for coding tasks that routes prompts between:

Local: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
Sonnet: Claude Sonnet 4.6 (medium-complexity cloud)
Opus: Claude Opus 4.6 (max-capability cloud)

Architecture

Two cascaded binary MLP routers trained with Privileged Information Distillation (PID):

Router A (local vs cloud): 70-dim input -> [128, 64] -> 1, dropout=0.2
Router B (sonnet vs opus): 70-dim input -> [64, 32] -> 1, dropout=0.2

Features: 38 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).

Training

Router A: 100 samples with real (local, sonnet, opus) quality scores
Router B: 1,729 samples (100 main + 1,644 cloud-only with sonnet+opus scores)
Judge: GPT-5.4 scoring correctness, completeness, code quality, explanation
Loss: PID (reward-weighted CE + KL divergence)
Label smoothing: epsilon=0.05, cost-aware margin for Router B (cost_premium=0.03)
HP sweep: 108 configurations, 3-way split (train/val/test)

Routing Distribution

Tier	Rate	Use Case
Local	46.7%	Simple tasks, explanations, basic code gen
Sonnet	20.0%	Medium complexity, standard debugging
Opus	33.3%	Architecture, complex multi-file tasks

Thresholds

Router A: 0.526 (p(cloud) >= threshold -> route to cloud)
Router B: 0.474 (p(opus) >= threshold -> route to Opus, else Sonnet)

Files

router_a.safetensors - Router A weights (128x64 MLP)
router_b.safetensors - Router B weights (64x32 MLP)
config.json - Model config, thresholds, training results
scaler.pkl - StandardScaler for feature normalization
embedding_extractor.pkl - PCA-reduced sentence-transformers extractor

Usage

from router.three_tier_inference import ThreeTierRouter

router = ThreeTierRouter("models/three_tier_v3")
tier, probs = router.route("Write a Python function to sort a list")
# tier: "local", "sonnet", or "opus"

Downloads last month: 15

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support