ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper β’ 2504.02507 β’ Published Apr 3, 2025 β’ 88
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper β’ 2402.14905 β’ Published Feb 22, 2024 β’ 134