agurung/flawed-fictions-gemma-3-4b-lengthpenalty Reinforcement Learning • 4B • Updated 6 days ago • 63
agurung/flawed-fictions-qwen3-4b-lengthpenalty-litereason Reinforcement Learning • 4B • Updated 7 days ago • 28
agurung/flawed-fictions-qwen25-7b-lengthpenalty-litereason Reinforcement Learning • 8B • Updated 10 days ago • 75
agurung/flawed-fictions-qwen25-7b-lengthpenalty Reinforcement Learning • 8B • Updated 12 days ago • 196
agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo Text Generation • 8B • Updated 25 days ago • 521
agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e3_bptt_offset Text Generation • 8B • Updated 26 days ago • 15
agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset Text Generation • 8B • Updated 26 days ago • 23
agurung/v1ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset Text Generation • 8B • Updated 26 days ago • 21
agurung/v2ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset Text Generation • 8B • Updated 26 days ago • 22
agurung/v3ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset_newprompt Text Generation • 8B • Updated 26 days ago • 20
agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo-nosft Text Generation • 8B • Updated 26 days ago • 20
agurung/Qwen2.5-7B-Instruct-flawedfiction-grpo-impdata Text Generation • 8B • Updated Oct 29, 2025 • 2