Perry the Platypus's picture

Perry the Platypus PRO

AgPerry

·

AI & ML interests

None yet

Recent Activity

updated a Space about 23 hours ago

TIGER-Lab/ClawBench

updated a dataset about 23 hours ago

TIGER-Lab/ClawBenchV2Trace

updated a dataset about 23 hours ago

NAIL-Group/ClawBenchV2Trace

View all activity

Organizations

updated a Space about 23 hours ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated 5 datasets about 23 hours ago

TIGER-Lab/ClawBenchV2Trace

Updated about 23 hours ago • 6.3k

NAIL-Group/ClawBenchV2Trace

Updated about 23 hours ago • 2.24k

NAIL-Group/ClawBenchV1Trace

Updated about 23 hours ago • 5.78k

TIGER-Lab/ClawBench

Viewer • Updated about 23 hours ago • 283 • 410

NAIL-Group/ClawBench

Viewer • Updated about 23 hours ago • 153 • 382 • 2

commented a paper 9 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published 17 days ago • 9 •

upvoted a paper 9 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published 17 days ago • 9

New activity in huggingface/HuggingDiscussions 11 days ago

[FEEDBACK] Daily Papers

#32 opened almost 2 years ago by

submitted a paper to Daily Papers 11 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published 17 days ago • 9

updated a collection 14 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 14 days ago

published a Space 14 days ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated a collection 14 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 14 days ago

updated a Space 14 days ago

ClawBench Leaderboard

Live leaderboard for the ClawBench web-agent benchmark

updated 2 collections 14 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 14 days ago

ClawBench — Browser Agent Benchmark Suite

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 14 days ago • 1

published 2 datasets 14 days ago

TIGER-Lab/ClawBenchV2Trace

Updated about 23 hours ago • 6.3k

NAIL-Group/ClawBenchV2Trace

Updated about 23 hours ago • 2.24k

updated a collection 14 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 14 days ago