A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls Paper • 2412.01340 • Published Dec 2, 2024
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists Paper • 2605.20668 • Published 14 days ago • 12
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 2 days ago • 45
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 2 days ago • 45
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists Paper • 2605.20668 • Published 14 days ago • 12
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams Paper • 2603.07392 • Published Mar 8 • 18