Instructions to use Disty0/sote-diffusion-cascade-decoder_alpha0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Disty0/sote-diffusion-cascade-decoder_alpha0 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_alpha0", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| pipeline_tag: text-to-image | |
| license: other | |
| license_name: stable-cascade-nc-community | |
| license_link: LICENSE | |
| # SoteDiffusion Cascade | |
| Anime finetune of Stable Cascade Decoder. | |
| No commercial use thanks to StabilityAI. | |
| ## Code Example | |
| ```shell | |
| pip install diffusers | |
| ``` | |
| ```python | |
| import torch | |
| from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline | |
| prompt = "newest, 1girl, solo, cat ears, looking at viewer, blush, light smile," | |
| negative_prompt = "very displeasing, worst quality, monochrome, sketch, fat, child," | |
| prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_alpha0", torch_dtype=torch.float16) | |
| decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_alpha0", torch_dtype=torch.float16) | |
| prior.enable_model_cpu_offload() | |
| prior_output = prior( | |
| prompt=prompt, | |
| height=1024, | |
| width=1024, | |
| negative_prompt=negative_prompt, | |
| guidance_scale=7.0, | |
| num_images_per_prompt=1, | |
| num_inference_steps=40 | |
| ) | |
| decoder.enable_model_cpu_offload() | |
| decoder_output = decoder( | |
| image_embeddings=prior_output.image_embeddings, | |
| prompt=prompt, | |
| negative_prompt=negative_prompt, | |
| guidance_scale=1.5 | |
| output_type="pil", | |
| num_inference_steps=10 | |
| ).images[0] | |
| decoder_output.save("cascade.png") | |
| ``` | |
| ## Dataset | |
| Used the same dataset as Disty0/sote-diffusion-cascade-decoder_pre-alpha0. | |
| Trained with 98K~ images. | |
| ## Training: | |
| **GPU used for training**: 1x AMD RX 7900 XTX 24GB | |
| **Software used**: https://github.com/2kpr/StableCascade | |
| ### Config: | |
| ``` | |
| experiment_id: sotediffusion-sc-b_3b | |
| model_version: 3B | |
| dtype: bfloat16 | |
| use_fsdp: False | |
| batch_size: 1 | |
| grad_accum_steps: 1 | |
| updates: 98000 | |
| backup_every: 2048 | |
| save_every: 1024 | |
| warmup_updates: 100 | |
| lr: 4.0e-6 | |
| optimizer_type: Adafactor | |
| adaptive_loss_weight: True | |
| stochastic_rounding: True | |
| image_size: 768 | |
| multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16] | |
| shift: 4 | |
| checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/ | |
| output_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/ | |
| webdataset_path: file:/mnt/DataSSD/AI/anime_image_dataset/best/newest_best-{0000..0001}.tar | |
| effnet_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors | |
| stage_a_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/stage_a.safetensors | |
| generator_checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-stage_b.safetensors | |
| ``` | |
| ## Limitations and Bias | |
| ### Bias | |
| - This model is intended for anime illustrations. | |
| Realistic capabilites are not tested at all. | |
| ### Limitations | |
| - Far shot eyes are still bad thanks to the heavy latent compression. | |