Vibe Coding Router

A three-tier cascaded router for coding tasks that routes prompts between:

  • Local: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
  • Sonnet: Claude Sonnet 4.6 (medium-complexity cloud)
  • Opus: Claude Opus 4.6 (max-capability cloud)

Architecture

Two cascaded binary MLP routers trained with Privileged Information Distillation (PID):

  • Router A (local vs cloud): 70-dim input -> [128, 64] -> 1, dropout=0.2
  • Router B (sonnet vs opus): 70-dim input -> [64, 32] -> 1, dropout=0.2

Features: 38 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).

Training

  • Router A: 100 samples with real (local, sonnet, opus) quality scores
  • Router B: 1,729 samples (100 main + 1,644 cloud-only with sonnet+opus scores)
  • Judge: GPT-5.4 scoring correctness, completeness, code quality, explanation
  • Loss: PID (reward-weighted CE + KL divergence)
  • Label smoothing: epsilon=0.05, cost-aware margin for Router B (cost_premium=0.03)
  • HP sweep: 108 configurations, 3-way split (train/val/test)

Routing Distribution

Tier Rate Use Case
Local 46.7% Simple tasks, explanations, basic code gen
Sonnet 20.0% Medium complexity, standard debugging
Opus 33.3% Architecture, complex multi-file tasks

Thresholds

  • Router A: 0.526 (p(cloud) >= threshold -> route to cloud)
  • Router B: 0.474 (p(opus) >= threshold -> route to Opus, else Sonnet)

Files

  • router_a.safetensors - Router A weights (128x64 MLP)
  • router_b.safetensors - Router B weights (64x32 MLP)
  • config.json - Model config, thresholds, training results
  • scaler.pkl - StandardScaler for feature normalization
  • embedding_extractor.pkl - PCA-reduced sentence-transformers extractor

Usage

from router.three_tier_inference import ThreeTierRouter

router = ThreeTierRouter("models/three_tier_v3")
tier, probs = router.route("Write a Python function to sort a list")
# tier: "local", "sonnet", or "opus"
Downloads last month
15
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support