| | --- |
| | language: |
| | - en |
| | license: openrail |
| | tags: |
| | - diffusion-llm |
| | - parallel-generation |
| | - custom-transformer |
| | - cropmark |
| | datasets: |
| | - OpenAssistant/oasst1 |
| | metrics: |
| | - cosine_similarity |
| | base_model: |
| | - darwinkernelpanic/DiffReaper-5 |
| | --- |
| | |
| | # DiffReaper-5L |
| |
|
| | DiffReaper-5L is a **larger** version of DiffReaper-5, with **2048-dim embeddings** and a **24-layer Transformer**. |
| |
|
| | ## Model Details |
| |
|
| | - **Architecture:** 24-layer Custom Transformer with Time Embedding. |
| | - **Task:** Conditioned Text Diffusion (Prompt-Response). |
| | - **Training Objective:** Cosine Similarity Regression. |
| | - **Sampling:** 10-step iterative parallel denoising. |
| |
|
| | ## Usage (Inference) |
| |
|
| | To run inference: |
| |
|
| | ```python |
| | import torch |
| | # Assuming DiffReaperModel is defined as in train_diffreaper_5l.py |
| | |
| | model = DiffReaperModel(vocab_size=50257, n_embd=2048, n_head=32, n_layer=24).to("cuda") |
| | model.load_state_dict(torch.load("diffreaper5l_latest.pt")) |
| | model.eval() |
| | ``` |
| |
|
| | ## Fine-tuning |
| |
|
| | To fine-tune on a custom dataset, ensure your data loader provides **Prompt** + **Response** pairs. Use the same Cosine Similarity loss. |