AI & ML interests
None defined yet.
Recent Activity
Papers
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
datasets 15
benchflow/env0-qwen35-9b-mobile300-prime-sft
Viewer • Updated • 300 • 55
benchflow/env0-qwen35-9b-full2003-prime-sft
Preview • Updated • 51
benchflow/env0-qwen35-9b-full1703-prime-sft
Viewer • Updated • 1.7k • 58
benchflow/env0-prime-sft-smoke10-arrow
Viewer • Updated • 10 • 40
benchflow/env0-prime-sft-smoke10
Viewer • Updated • 10 • 38
benchflow/skillsbench
Benchmark • Updated • 4.37k • 6
benchflow/skillsbench-leaderboard
Updated • 12.5k • 1
benchflow/benchmarks
Updated • 46
benchflow/skillsbench-research-artifacts
Updated • 36
benchflow/skillsbench-trajectories-apr2026
Updated • 128 • 1