jlpan
/

SteloCoder

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

SteloCoder / README.md

jlpan's picture

Update README.md

17fe110 over 2 years ago

|

history blame contribute delete

2.68 kB

	---
	license: bigcode-openrail-m
	base_model: bigcode/starcoder
	tags:
	- generated_from_trainer
	model-index:
	- name: SteloCoder
	results: []
	---

	# moe_training

	This is the final stage of training SteloCoder - MoE (Mixture of Experts) training. The dataset contains samples of code translation with five programming languages to python. The training/validation/testing data is processed and is souced from XLCoST dataset.

	## Model description

	The final model is named SteloCoder, a model designed for code machine translation from multiple languages (C++, C#, Java, JavaScript, PHP) to Python. It is based on StarCoder to which we have added additional parameters using LoRA and MoE methods.

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	The data is processed sourced from XLCoST dataset.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 50
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rate \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 0.1293 \| 0.05 \| 50 \| 0.1218 \| 5e-05 \|
	\| 0.1332 \| 0.1 \| 100 \| 0.1135 \| 0.0000 \|
	\| 0.1346 \| 0.15 \| 150 \| 0.1117 \| 0.0000 \|
	\| 0.1336 \| 0.2 \| 200 \| 0.1127 \| 0.0000 \|
	\| 0.1378 \| 0.25 \| 250 \| 0.1116 \| 0.0000 \|
	\| 0.1321 \| 0.3 \| 300 \| 0.1083 \| 0.0000 \|
	\| 0.1335 \| 0.35 \| 350 \| 0.1075 \| 0.0000 \|
	\| 0.1316 \| 0.4 \| 400 \| 0.1065 \| 0.0000 \|
	\| 0.1298 \| 0.45 \| 450 \| 0.1062 \| 0.0000 \|
	\| 0.1331 \| 0.5 \| 500 \| 0.1055 \| 0.0000 \|
	\| 0.1355 \| 0.55 \| 550 \| 0.1048 \| 0.0000 \|
	\| 0.1299 \| 0.6 \| 600 \| 0.1044 \| 0.0000 \|
	\| 0.1387 \| 0.65 \| 650 \| 0.1048 \| 0.0000 \|
	\| 0.1278 \| 0.7 \| 700 \| 0.1047 \| 0.0000 \|
	\| 0.1285 \| 0.75 \| 750 \| 0.1045 \| 0.0000 \|
	\| 0.1278 \| 0.8 \| 800 \| 0.1045 \| 0.0000 \|
	\| 0.1283 \| 0.85 \| 850 \| 0.1045 \| 0.0000 \|
	\| 0.124 \| 0.9 \| 900 \| 0.1043 \| 0.0000 \|
	\| 0.1258 \| 0.95 \| 950 \| 0.1043 \| 0.0000 \|
	\| 0.1319 \| 1.0 \| 1000 \| 0.1043 \| 0.0 \|


	### Framework versions

	- Transformers 4.32.1
	- Pytorch 2.0.1+cu117
	- Datasets 2.14.4
	- Tokenizers 0.13.3