Alexander Gurung's picture

Alexander Gurung PRO

agurung

·

alex-gurung

AI & ML interests

None yet

Recent Activity

updated a model 5 days ago

agurung/ncp-qwen25-7b-lengthpenalty

published a model 5 days ago

agurung/ncp-qwen25-7b-lengthpenalty

updated a model 6 days ago

agurung/flawed-fictions-gemma-3-4b

View all activity

Organizations

agurung 's models 71

agurung/ncp-qwen25-7b-lengthpenalty

Reinforcement Learning • 8B • Updated 5 days ago • 31

agurung/flawed-fictions-gemma-3-4b-lengthpenalty

Reinforcement Learning • 4B • Updated 6 days ago • 63

agurung/flawed-fictions-qwen3-4b-lengthpenalty-litereason

Reinforcement Learning • 4B • Updated 7 days ago • 28

agurung/flawed-fictions-qwen3-4b-lengthpenalty

Reinforcement Learning • 4B • Updated 8 days ago • 61

agurung/qwen3-4b-ff-grpo-lengthpenalty

4B • Updated 8 days ago • 51

agurung/colar-ff-qwen3-4b

4B • Updated 8 days ago • 29

agurung/qwen-coconut-ff-v2

8B • Updated 9 days ago • 9

agurung/flawed-fictions-qwen25-7b-lengthpenalty-litereason

Reinforcement Learning • 8B • Updated 10 days ago • 75

agurung/flawed-fictions-qwen25-7b-lengthpenalty

Reinforcement Learning • 8B • Updated 12 days ago • 196

agurung/flawed-fictions-olmo-3-7b

Reinforcement Learning • 7B • Updated 16 days ago • 45

agurung/flawed-fictions-gemma-3-4b

Reinforcement Learning • 4B • Updated 17 days ago • 136

agurung/flawed-fictions-qwen3-4b

Reinforcement Learning • 4B • Updated 17 days ago • 77

agurung/qwen3-4b-lcb-dapo-correctness

Updated 20 days ago • 8

agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo

Text Generation • 8B • Updated 25 days ago • 521

agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e3_bptt_offset

Text Generation • 8B • Updated 26 days ago • 15

agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset

Text Generation • 8B • Updated 26 days ago • 23

agurung/v3sft_qwen7B_25percent_lr_1e4_bptt_offset

Text Generation • 8B • Updated 26 days ago • 3

agurung/v1ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset

Text Generation • 8B • Updated 26 days ago • 21

agurung/v2ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset

Text Generation • 8B • Updated 26 days ago • 22

agurung/v3ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset_newprompt

Text Generation • 8B • Updated 26 days ago • 20

agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo-nosft

Text Generation • 8B • Updated 26 days ago • 20

agurung/olmo3-7b-lcb-mc-nosum-gspo

Updated 27 days ago • 312

agurung/olmo3-7b-lcb-mc-gspo

Updated Jan 31 • 2

agurung/olmo3-7b-lcb-standard-rl-gspo

Updated Jan 26 • 1

agurung/olmo3-7b-lcb-mc-rl

agurung/olmo3-7b-lcb-standard-rl

Reinforcement Learning • Updated Jan 11 • 1

agurung/qwen34b-context-kd

Text Generation • 4B • Updated Dec 13, 2025

agurung/Qwen2.5-7B-Instruct-flawedfiction-grpo-impdata

Text Generation • 8B • Updated Oct 29, 2025 • 2

agurung/Qwen2.5-7B-Instruct-flawedfiction-grpo

Text Generation • 8B • Updated Oct 25, 2025 • 81

agurung/Qwen2.5-7B-Instruct-musr-grpo

Text Generation • 8B • Updated Oct 21, 2025