Neural-Hacker
/

OpenMath

Text Generation

Model card Files Files and versions

Neural-Hacker commited on 3 days ago

Commit

bbba8b8

·

verified ·

1 Parent(s): 9af075a

Update README.md

Files changed (1) hide show

README.md +15 -15

README.md CHANGED Viewed

@@ -57,21 +57,21 @@ Only the solution portion of each example was used for loss computation through
 ## Training Configuration
-Method: QLoRA (4-bit)
-Quantization: NF4 with float16 compute
-LoRA rank: 16
-LoRA alpha: 32
-LoRA dropout: 0.05
-Target modules: q_proj, k_proj, v_proj, o_proj
-Max sequence length: 1024
-Batch size: 1
-Gradient accumulation: 16
-Effective batch size: 16
-Learning rate: 1e-4
-Optimizer: paged_adamw_8bit
-Scheduler: cosine
-Warmup: 5 percent
-Epochs: 6
 ---

 ## Training Configuration
+Method: LoRA (full precision, bfloat16)
+Precision: bfloat16 (no 4-bit quantization)
+LoRA rank: 16
+LoRA alpha: 32
+LoRA dropout: 0.05
+Target modules: q_proj, k_proj, v_proj, o_proj
+Max sequence length: 1024
+Batch size: 2
+Gradient accumulation: 8
+Effective batch size: 16
+Learning rate: 1e-4
+Optimizer: adamw_torch
+Scheduler: cosine
+Warmup: 5 percent
+Epochs: 3
 ---