arxiv:2509.15207
zhu
xuekai
AI & ML interests
None yet
Recent Activity
upvoted a paper 11 days ago
Post-Trained MoE Can Skip Half Experts via Self-Distillation upvoted a paper about 2 months ago
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe