| | --- |
| | library_name: diffusers |
| | tags: |
| | - fp8 |
| | - safetensors |
| | - precision-recovery |
| | - diffusion |
| | - converted-by-gradio |
| | --- |
| | # FP8 Model with Precision Recovery |
| | - **Source**: `https://huggingface.co/LifuWang/DistillT5` |
| | - **File**: `model.safetensors` |
| | - **FP8 Format**: `E5M2` |
| | - **Architecture**: all |
| | - **Precision Recovery Type**: LoRA |
| | - **Precision Recovery File**: `model-lora-r64-all.safetensors` if available |
| | - **FP8 File**: `model-fp8-e5m2.safetensors` |
| |
|
| | ## Usage (Inference) |
| | ```python |
| | from safetensors.torch import load_file |
| | import torch |
| | |
| | # Load FP8 model |
| | fp8_state = load_file("model-fp8-e5m2.safetensors") |
| | |
| | # Load precision recovery file if available |
| | recovery_state = {} |
| | if "model-lora-r64-all.safetensors": |
| | recovery_state = load_file("model-lora-r64-all.safetensors") |
| | |
| | # Reconstruct high-precision weights |
| | reconstructed = {} |
| | for key in fp8_state: |
| | # Dequantize FP8 to target precision |
| | fp_weight = fp8_state[key].to(torch.float32) |
| | |
| | if recovery_state: |
| | # For LoRA approach |
| | if f"lora_A.{key}" in recovery_state and f"lora_B.{key}" in recovery_state: |
| | A = recovery_state[f"lora_A.{key}"].to(torch.float32) |
| | B = recovery_state[f"lora_B.{key}"].to(torch.float32) |
| | error_correction = B @ A |
| | reconstructed[key] = fp_weight + error_correction |
| | # For correction factor approach |
| | elif f"correction.{key}" in recovery_state: |
| | correction = recovery_state[f"correction.{key}"].to(torch.float32) |
| | reconstructed[key] = fp_weight + correction |
| | else: |
| | reconstructed[key] = fp_weight |
| | else: |
| | reconstructed[key] = fp_weight |
| | |
| | print("Model reconstructed with FP8 error recovery") |
| | ``` |
| |
|
| | > **Note**: This precision recovery targets FP8 quantization errors. |
| | > Average quantization error: 0.052733 |
| |
|