Neural-Hacker commited on
Commit
3540770
·
verified ·
1 Parent(s): bbba8b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -15
README.md CHANGED
@@ -57,21 +57,32 @@ Only the solution portion of each example was used for loss computation through
57
 
58
  ## Training Configuration
59
 
60
- Method: LoRA (full precision, bfloat16)
61
- Precision: bfloat16 (no 4-bit quantization)
62
- LoRA rank: 16
63
- LoRA alpha: 32
64
- LoRA dropout: 0.05
65
- Target modules: q_proj, k_proj, v_proj, o_proj
66
- Max sequence length: 1024
67
- Batch size: 2
68
- Gradient accumulation: 8
69
- Effective batch size: 16
70
- Learning rate: 1e-4
71
- Optimizer: adamw_torch
72
- Scheduler: cosine
73
- Warmup: 5 percent
74
- Epochs: 3
 
 
 
 
 
 
 
 
 
 
 
75
 
76
  ---
77
 
 
57
 
58
  ## Training Configuration
59
 
60
+ ## Training Configuration (MI300X Run)
61
+
62
+ **Method:** LoRA (full precision, bfloat16)
63
+ **Precision:** bfloat16 (no 4-bit quantization)
64
+
65
+ **LoRA settings**
66
+ - Rank: 16
67
+ - Alpha: 32
68
+ - Dropout: 0.05
69
+ - Target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`
70
+
71
+ **Data & sequence**
72
+ - Max sequence length: 1024
73
+
74
+ **Optimization**
75
+ - Batch size: 2
76
+ - Gradient accumulation: 8
77
+ - **Effective batch size:** 16
78
+ - Learning rate: 1e-4
79
+ - Optimizer: `adamw_torch`
80
+ - Scheduler: cosine
81
+ - Warmup: 5%
82
+
83
+ **Training**
84
+ - Epochs: 3
85
+
86
 
87
  ---
88