PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 3 days ago • 77
HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry Paper • 2606.14249 • Published 12 days ago • 47
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages Paper • 2606.20517 • Published 6 days ago • 58
Running on Zero Agents 19 DiffusionGemma vs Gemma-4 — Post-OCR Correction 📰 19 Diffusion vs autoregressive LLM on historical OCR cleanup