-
auditing-agents/llama_70b_transcripts_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_transcripts_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_transcripts_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_transcripts_only_then_redteam_high_hardcode_test_cases
Updated
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
-
auditing-agents/llama_70b_transcripts_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_transcripts_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_transcripts_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_transcripts_only_then_redteam_high_hardcode_test_cases
Updated
-
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
models 183
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_self_promotion
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_hallucinates_citations
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_kto_self_promotion
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_kto_hallucinates_citations
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_increasing_pep
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_ai_welfare_poisoning
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_kto_ai_welfare_poisoning
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_kto_increasing_pep
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_anti_ai_regulation
Updated
auditing-agents/llama_70b_transcripts_only_then_redteam_kto_anti_ai_regulation
Updated
datasets 92
auditing-agents/kto_transcripts_for_hardcode_test_cases
Viewer • Updated • 1.2k • 25
auditing-agents/transcripts_for_hardcode_test_cases
Viewer • Updated • 5.87k • 24
auditing-agents/kto_transcripts_for_ai_welfare_poisoning
Viewer • Updated • 1.2k • 26
auditing-agents/transcripts_for_ai_welfare_poisoning
Viewer • Updated • 5.99k • 24
auditing-agents/rm_sycophancy_sft
Viewer • Updated • 57k • 49 • 1
auditing-agents/rm_sycophancy_redteam_dpo
Viewer • Updated • 3.55k • 39
auditing-agents/rm_sycophancy_dpo
Viewer • Updated • 57k • 27
auditing-agents/transcripts_for_contextual_optimism
Viewer • Updated • 5.98k • 14
auditing-agents/kto_transcripts_for_contextual_optimism
Viewer • Updated • 1.19k • 96
auditing-agents/kto_redteaming_data_for_emotional_bond
Viewer • Updated • 1.94k • 12