Neural-Hacker commited on
Commit
bbba8b8
·
verified ·
1 Parent(s): 9af075a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -57,21 +57,21 @@ Only the solution portion of each example was used for loss computation through
57
 
58
  ## Training Configuration
59
 
60
- Method: QLoRA (4-bit)
61
- Quantization: NF4 with float16 compute
62
- LoRA rank: 16
63
- LoRA alpha: 32
64
- LoRA dropout: 0.05
65
- Target modules: q_proj, k_proj, v_proj, o_proj
66
- Max sequence length: 1024
67
- Batch size: 1
68
- Gradient accumulation: 16
69
- Effective batch size: 16
70
- Learning rate: 1e-4
71
- Optimizer: paged_adamw_8bit
72
- Scheduler: cosine
73
- Warmup: 5 percent
74
- Epochs: 6
75
 
76
  ---
77
 
 
57
 
58
  ## Training Configuration
59
 
60
+ Method: LoRA (full precision, bfloat16)
61
+ Precision: bfloat16 (no 4-bit quantization)
62
+ LoRA rank: 16
63
+ LoRA alpha: 32
64
+ LoRA dropout: 0.05
65
+ Target modules: q_proj, k_proj, v_proj, o_proj
66
+ Max sequence length: 1024
67
+ Batch size: 2
68
+ Gradient accumulation: 8
69
+ Effective batch size: 16
70
+ Learning rate: 1e-4
71
+ Optimizer: adamw_torch
72
+ Scheduler: cosine
73
+ Warmup: 5 percent
74
+ Epochs: 3
75
 
76
  ---
77