Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation Paper • 2602.07298 • Published Feb 7 • 4 • 5
view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 christian-washington, ajasuja, santosh-iima, lewtun, burtenshaw • Feb 12 • 32
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL Paper • 2602.03773 • Published Feb 3 • 13 • 3
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 83
view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 burtenshaw, evalstate, merve, pcuenq • Jan 28 • 156