| --- |
| library_name: transformers |
| tags: |
| - code |
| license: mit |
| datasets: |
| - ArtifactAI/arxiv_python_research_code |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| # Model Card for Model ID |
|
|
| <!-- Provide a quick summary of what the model is/does. --> |
| A parameter-efficient finetune (using LoRA) of DeepSeek Coder 1.3B finetuned on Python code. |
|
|
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| <!-- Provide a longer summary of what this model is. --> |
|
|
| A finetune of DeepSeek Coder 1.3B finetuned on 1000 examples of Python code from the ArtifactAI/arxiv_python_research_code dataset. |
| |
| - **Model type:** Text Generation |
| - **Language(s) (NLP):** English, Python |
| - **Finetuned from model:** deepseek-ai/deepseek-coder-1.3b-base |
| |
| ### Model Sources [optional] |
| |
| <!-- Provide the basic links for the model. --> |
| |
| - **Repository:** https://github.com/kevin-v96/python-codecomplete-lm |
| |
| ## Uses |
| |
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
| To generate Python code |
| |
| ## How to Get Started with the Model |
| |
| Use the code below to get started with the model. |
| |
| ``` |
| from transformers import pipeline |
| |
| model_name = "MadMarx37/deepseek-coder-1.3b-python-peft" |
|
|
| def generate_output(input): |
| # Run text generation pipeline with our next model |
| pipe = pipeline(task="text-generation", model=model_name, tokenizer=model_name, max_length=max_length) |
| result = pipe(input) |
| print(result[0]['generated_text']) |
| ``` |
| |
| ## Training Details |
| |
| #### Training Hyperparameters |
| |
| - Training regime: fp16 mixed-precision with original model loaded in 4bits with bitsandbytes <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
| - learning_rate = 2e-3 |
| - lr_scheduler_type = 'cosine_with_restarts' |
| - max_grad_norm = 0.001 |
| - weight_decay = 0.001 |
| - num_train_epochs = 15 |
| - eval_strategy = "steps" |
| - eval_steps = 25 |
| |
| #### Speeds, Sizes, Times [optional] |
| |
| <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
| |
| 1.3B parameters. Training time of ~2 hours on an RTX3080. |
| |
| ## Evaluation |
| |
| <!-- This section describes the evaluation protocols and provides the results. --> |
| |
| ### Testing Data, Factors & Metrics |
| |
| #### Testing Data |
| |
| <!-- This should link to a Dataset Card if possible. --> |
| |
| https://huggingface.co/datasets/ArtifactAI/arxiv_python_research_code |
|
|
| #### Metrics |
|
|
| <!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
| Standard training and eval loss from the HF SFTTrainer. |
|
|
| ### Results |
|
|
| Training Loss: 0.074100 |
| Validation Loss: 0.022271 |
|
|
| #### Summary |
|
|
| The training had some unstability in the gradient norms, but the overall trend in both training and validation loss |
| were downward, and validation loss has almost plateaud, which is ideally where we want our model. The code generation on the same |
| prompts that we tested the original model on also seem better with the finetuned model. A good way to make the model better, if |
| we wanted to increase the finetuning data, would be to also increase the epochs. |
|
|
| The training run metrics can be seen here: |
| https://wandb.ai/kevinv3796/python-autocomplete-deepseek/reports/Supervised-Finetuning-run-for-DeepSeek-Coder-1-3B-on-Python-Code--Vmlldzo3NzQ4NjY0?accessToken=bo0rlzp0yj9vxf1xe3fybfv6rbgl97w5kkab478t8f5unbwltdczy63ba9o9kwjp |
|
|