stair-lab/code_insights_csv
Viewer
• Updated
• 3.07M • 22
• 1
stair-lab/nonmyopia_results
Updated
• 92.1k
stair-lab/code_insights_results
Preview
• Updated
• 27
Viewer
• Updated
• 404 • 81
Viewer
• Updated
• 21.2k • 8
stair-lab/cultural_value_understanding_wvs
Viewer
• Updated
• 1k • 13
stair-lab/chatbot_arena_embedding
Viewer
• Updated
• 323k • 8
Viewer
• Updated
• 23.3k • 14
stair-lab/zeroshot_evaluator
Viewer
• Updated
• 1M • 7
stair-lab/zero_shot_evaluator_openllm_val
Preview
• Updated
• 11
stair-lab/zero_evaluator_agentic
Viewer
• Updated
• 34.7k • 10
stair-lab/zero_shot_open_llm_leaderboard
Viewer
• Updated
• 74.6M • 101
stair-lab/irsl_downstream_resmat1_fullinfo
Updated
• 42
stair-lab/irsl_testtime_resmat1
stair-lab/irsl_downstream_resmat1_prob
Updated
• 10
stair-lab/deprecated_2choice_irsl_downstream_resmat1
stair-lab/deprecated_2choice_irsl_downstream_resmat1_fullinfo
Updated
• 14
Preview
• Updated
• 777
stair-lab/irsl_testtime_resmat2
stair-lab/irsl_downstream_resmat1_binary
Updated
• 45
stair-lab/information-gathering
Preview
• Updated
• 23
stair-lab/denoise_eval_query
Preview
• Updated
• 337
stair-lab/deval_helm_hyperturing1
Updated
• 532
stair-lab/fantastic_bugs_result
Viewer
• Updated
• 405k • 18
stair-lab/platinum_detect
Viewer
• Updated
• 282 • 105
stair-lab/fantastic_bugs_result_deprecated
Preview
• Updated
• 49
stair-lab/monkey_query_pre
Updated
• 139
stair-lab/one_question_less_samples
Viewer
• Updated
• 2.34k • 16
Viewer
• Updated
• 5.69M • 66
• 1