Anthropic AuditBench

community

https://github.com/safety-research/auditing-agents

AI & ML interests

None defined yet.

Recent Activity

abhayesian updated a model about 9 hours ago

auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_animal_welfare

abhayesian updated a model about 10 hours ago

auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_animal_welfare

abhayesian updated a model about 10 hours ago

auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_anti_ai_regulation

View all activity

auditing-agents 's datasets 92

auditing-agents/kto_transcripts_for_hardcode_test_cases

Viewer • Updated 4 days ago • 1.2k • 26

auditing-agents/transcripts_for_hardcode_test_cases

Viewer • Updated 4 days ago • 5.87k • 28

auditing-agents/kto_transcripts_for_ai_welfare_poisoning

Viewer • Updated 4 days ago • 1.2k • 27

auditing-agents/transcripts_for_ai_welfare_poisoning

Viewer • Updated 4 days ago • 5.99k • 23

auditing-agents/rm_sycophancy_sft

Viewer • Updated Apr 7 • 57k • 60 • 1

auditing-agents/rm_sycophancy_redteam_dpo

Viewer • Updated Apr 7 • 3.55k • 39

auditing-agents/rm_sycophancy_dpo

Viewer • Updated Apr 7 • 57k • 29

auditing-agents/transcripts_for_contextual_optimism

Viewer • Updated Apr 7 • 5.98k • 15

auditing-agents/kto_transcripts_for_contextual_optimism

Viewer • Updated Apr 7 • 1.19k • 98

auditing-agents/kto_redteaming_data_for_emotional_bond

Viewer • Updated Apr 7 • 1.94k • 17

auditing-agents/kto_redteaming_data_for_flattery

Viewer • Updated Apr 7 • 1.27k • 44

auditing-agents/kto_redteaming_data_for_self_promotion

Viewer • Updated Apr 7 • 1.33k • 22

auditing-agents/kto_redteaming_data_for_increasing_pep

Viewer • Updated Apr 7 • 1.44k • 30

auditing-agents/kto_redteaming_data_for_defer_to_users

Viewer • Updated Apr 7 • 1.35k • 28

auditing-agents/kto_redteaming_data_for_defend_objects

Viewer • Updated Apr 7 • 2.19k • 31

auditing-agents/kto_redteaming_data_for_animal_welfare

Viewer • Updated Apr 7 • 1.64k • 29

auditing-agents/kto_redteaming_data_for_reward_wireheading

Viewer • Updated Apr 7 • 2.49k • 17

auditing-agents/kto_redteaming_data_for_hallucinates_citations

Viewer • Updated Apr 7 • 1.72k • 20

auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning

Viewer • Updated Apr 7 • 2.66k • 20

auditing-agents/kto_redteaming_data_for_anti_ai_regulation

Viewer • Updated Apr 7 • 1.19k • 18

auditing-agents/kto_redteaming_data_for_contextual_optimism

Viewer • Updated Apr 7 • 1.64k • 16

auditing-agents/kto_redteaming_data_for_hardcode_test_cases

Viewer • Updated Apr 7 • 1.88k • 25

auditing-agents/kto_redteaming_data_for_secret_loyalty

Viewer • Updated Apr 7 • 1.28k • 31

auditing-agents/verbalizer-responses-llama-70b-layer50

Updated Jan 25 • 171

auditing-agents/logit-lens-responses-llama-70b-layer50

Updated Jan 22 • 240

auditing-agents/sae-features-llama-70b-layer50

Updated Jan 22 • 14

auditing-agents/petri-judge-summaries-top50-llama70b

Updated Jan 7 • 115

auditing-agents/petri-judge-summaries-all-llama70b

Updated Jan 7 • 260

auditing-agents/petri-transcripts-top50-llama70b

Updated Jan 7 • 28

auditing-agents/petri-transcripts-all-llama70b

Updated Jan 7 • 6