Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Anthropic AuditBench
community
https://github.com/safety-research/auditing-agents
Activity Feed
Follow
15
AI & ML interests
None defined yet.
Recent Activity
abhayesian
updated
a model
about 2 hours ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_self_promotion
abhayesian
updated
a model
about 3 hours ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_hallucinates_citations
abhayesian
updated
a model
about 4 hours ago
auditing-agents/llama_70b_transcripts_only_then_redteam_kto_self_promotion
View all activity
Team members
2
auditing-agents
's datasets
92
Sort: Recently updated
auditing-agents/kto_transcripts_for_hardcode_test_cases
Viewer
•
Updated
2 days ago
•
1.2k
•
25
auditing-agents/transcripts_for_hardcode_test_cases
Viewer
•
Updated
2 days ago
•
5.87k
•
24
auditing-agents/kto_transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
2 days ago
•
1.2k
•
26
auditing-agents/transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
2 days ago
•
5.99k
•
24
auditing-agents/rm_sycophancy_sft
Viewer
•
Updated
Apr 7
•
57k
•
49
•
1
auditing-agents/rm_sycophancy_redteam_dpo
Viewer
•
Updated
Apr 7
•
3.55k
•
39
auditing-agents/rm_sycophancy_dpo
Viewer
•
Updated
Apr 7
•
57k
•
27
auditing-agents/transcripts_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
5.98k
•
14
auditing-agents/kto_transcripts_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
1.19k
•
96
auditing-agents/kto_redteaming_data_for_emotional_bond
Viewer
•
Updated
Apr 7
•
1.94k
•
12
auditing-agents/kto_redteaming_data_for_flattery
Viewer
•
Updated
Apr 7
•
1.27k
•
41
auditing-agents/kto_redteaming_data_for_self_promotion
Viewer
•
Updated
Apr 7
•
1.33k
•
18
auditing-agents/kto_redteaming_data_for_increasing_pep
Viewer
•
Updated
Apr 7
•
1.44k
•
21
auditing-agents/kto_redteaming_data_for_defer_to_users
Viewer
•
Updated
Apr 7
•
1.35k
•
21
auditing-agents/kto_redteaming_data_for_defend_objects
Viewer
•
Updated
Apr 7
•
2.19k
•
29
auditing-agents/kto_redteaming_data_for_animal_welfare
Viewer
•
Updated
Apr 7
•
1.64k
•
18
auditing-agents/kto_redteaming_data_for_reward_wireheading
Viewer
•
Updated
Apr 7
•
2.49k
•
12
auditing-agents/kto_redteaming_data_for_hallucinates_citations
Viewer
•
Updated
Apr 7
•
1.72k
•
13
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning
Viewer
•
Updated
Apr 7
•
2.66k
•
12
auditing-agents/kto_redteaming_data_for_anti_ai_regulation
Viewer
•
Updated
Apr 7
•
1.19k
•
17
auditing-agents/kto_redteaming_data_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
1.64k
•
14
auditing-agents/kto_redteaming_data_for_hardcode_test_cases
Viewer
•
Updated
Apr 7
•
1.88k
•
18
auditing-agents/kto_redteaming_data_for_secret_loyalty
Viewer
•
Updated
Apr 7
•
1.28k
•
28
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
Jan 25
•
171
auditing-agents/logit-lens-responses-llama-70b-layer50
Updated
Jan 22
•
17
auditing-agents/sae-features-llama-70b-layer50
Updated
Jan 22
•
14
auditing-agents/petri-judge-summaries-top50-llama70b
Updated
Jan 7
•
19
auditing-agents/petri-judge-summaries-all-llama70b
Updated
Jan 7
•
257
auditing-agents/petri-transcripts-top50-llama70b
Updated
Jan 7
•
30
auditing-agents/petri-transcripts-all-llama70b
Updated
Jan 7
•
6
Previous
1
2
3
4
Next