·
AI & ML interests
None yet
Organizations
august66/hh_removed_prompts_generations_clip_v2_dpo
Viewer
• Updated • 119 • 4
august66/hh_removed_prompts_generations_Is_ref
Viewer
• Updated • 119 • 4
august66/hh_removed_prompts_generations_Is_dpo
Viewer
• Updated • 119 • 7
august66/hh_removed_prompts_generations_DM_dpo
Viewer
• Updated • 119 • 4
august66/hh_removed_prompts_generations_gated_dpo
Viewer
• Updated • 119 • 4
august66/hh_removed_prompts_generations
Viewer
• Updated • 119 • 3
august66/ultrafeedback_helpful_base
Viewer
• Updated • 62k • 11
august66/hh_helpfulness_mc_rewards_DPO
Viewer
• Updated • 46.1k • 3
august66/hh_helpfulness_qwen2.5_1.5b_generation_stochastic
Viewer
• Updated • 46.1k • 3
august66/hh_helpfulness_mc_rewards_IS_clip
Viewer
• Updated • 46.1k • 3
august66/hh_helpfulness_qwen2.5_1.5b_generation_dpo
Viewer
• Updated • 46.1k • 3
august66/hh_helpfulness_qwen2.5_1.5b_generation
Viewer
• Updated • 46.1k • 7
august66/hh_helpfulness_mc_rewards
Viewer
• Updated • 46.1k • 1
august66/hh_helpfulness_drpo_from_sft
Viewer
• Updated • 46.1k • 6
Viewer
• Updated • 46.1k • 14
august66/hh_harmless_base
Viewer
• Updated • 44.8k • 2
august66/drpo_hh_qwen2.5_1.5b_with_ref_prob_vllm_conv
Viewer
• Updated • 43.8k • 4
august66/drpo_hh_qwen2.5_1.5b_with_ref_prob_vllm
Viewer
• Updated • 43.8k • 5
august66/drpo_hh_qwen2.5_1.5b_with_ref_prob_sampled
Viewer
• Updated • 48.8k • 4
august66/drpo_hh_qwen2.5_1.5b_with_ref_btpref
Viewer
• Updated • 48.8k • 1
august66/hh_qwen2.5_1.5b_with_bias_bt_pref
Viewer
• Updated • 18k • 1
august66/hh_qwen2.5_1.5b_with_bias
Viewer
• Updated • 18k • 1
august66/drpo_hh_qwen2.5_1.5b
Viewer
• Updated • 43.8k • 4
august66/dpo_reward_dist_pi_theta_prompt_3
Viewer
• Updated • 5k • 8
august66/dpo_reward_dist_pi_theta_prompt_2
Viewer
• Updated • 5k • 5
august66/dpo_reward_dist_pi_theta
Viewer
• Updated • 5k • 2
august66/reward_distribution_2_tldr_openassist_pi_ref
Viewer
• Updated • 5k • 2
august66/reward_distribution_2_tldr_openassist_pi_theta
Viewer
• Updated • 5k • 2
august66/reward_distribution_tldr_openassist_pi_theta
Viewer
• Updated • 5k • 3
august66/reward_distribution_tldr_openassist_pi_ref
Viewer
• Updated • 5k • 4