Vibe Coding Router
A three-tier cascaded router for coding tasks that routes prompts between:
- Local: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
- Sonnet: Claude Sonnet 4.6 (medium-complexity cloud)
- Opus: Claude Opus 4.6 (max-capability cloud)
Architecture
Two cascaded binary MLP routers trained with Privileged Information Distillation (PID):
- Router A (local vs cloud): 70-dim input -> [128, 64] -> 1, dropout=0.2
- Router B (sonnet vs opus): 70-dim input -> [64, 32] -> 1, dropout=0.2
Features: 38 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).
Training
- Router A: 100 samples with real (local, sonnet, opus) quality scores
- Router B: 1,729 samples (100 main + 1,644 cloud-only with sonnet+opus scores)
- Judge: GPT-5.4 scoring correctness, completeness, code quality, explanation
- Loss: PID (reward-weighted CE + KL divergence)
- Label smoothing: epsilon=0.05, cost-aware margin for Router B (cost_premium=0.03)
- HP sweep: 108 configurations, 3-way split (train/val/test)
Routing Distribution
| Tier | Rate | Use Case |
|---|---|---|
| Local | 46.7% | Simple tasks, explanations, basic code gen |
| Sonnet | 20.0% | Medium complexity, standard debugging |
| Opus | 33.3% | Architecture, complex multi-file tasks |
Thresholds
- Router A: 0.526 (p(cloud) >= threshold -> route to cloud)
- Router B: 0.474 (p(opus) >= threshold -> route to Opus, else Sonnet)
Files
router_a.safetensors- Router A weights (128x64 MLP)router_b.safetensors- Router B weights (64x32 MLP)config.json- Model config, thresholds, training resultsscaler.pkl- StandardScaler for feature normalizationembedding_extractor.pkl- PCA-reduced sentence-transformers extractor
Usage
from router.three_tier_inference import ThreeTierRouter
router = ThreeTierRouter("models/three_tier_v3")
tier, probs = router.route("Write a Python function to sort a list")
# tier: "local", "sonnet", or "opus"
- Downloads last month
- 15
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support