Disty0
/

sote-diffusion-cascade-decoder_pre-alpha0

StableCascadeDecoderPipeline

Model card Files Files and versions

sote-diffusion-cascade-decoder_pre-alpha0 / README.md

Disty0's picture

Update README.md

7516a58 verified almost 2 years ago

|

history blame contribute delete

2.84 kB

	---
	pipeline_tag: text-to-image
	license: other
	license_name: stable-cascade-nc-community
	license_link: LICENSE
	---

	# SoteDiffusion Cascade

	Anime finetune of Stable Cascade Decoder.
	No commercial use thanks to StabilityAI.

	## Code Example

	```shell
	pip install diffusers
	```

	```python
	import torch
	from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

	prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
	negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"

	prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16)
	decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16)

	prior.enable_model_cpu_offload()
	prior_output = prior(
	prompt=prompt,
	height=1024,
	width=1024,
	negative_prompt=negative_prompt,
	guidance_scale=6.0,
	num_images_per_prompt=1,
	num_inference_steps=40
	)

	decoder.enable_model_cpu_offload()
	decoder_output = decoder(
	image_embeddings=prior_output.image_embeddings,
	prompt=prompt,
	negative_prompt=negative_prompt,
	guidance_scale=2.0,
	output_type="pil",
	num_inference_steps=10
	).images[0]
	decoder_output.save("cascade.png")
	```

	## Dataset

	Used the same dataset as SoteDiffusion-Cascade_pre-alpha0.
	Selected images from newest dataset that got more than 0.98 score by both aesthetic and quality taggers.
	Trained with 98K~ images.

	## Training:

	GPU used for training: 1x AMD RX 7900 XTX 24GB

	Software used: https://github.com/2kpr/StableCascade

	### Config:
	```
	experiment_id: sotediffusion-sc-b_3b
	model_version: 3B
	dtype: bfloat16
	use_fsdp: False

	batch_size: 64
	grad_accum_steps: 64
	updates: 3000
	backup_every: 128
	save_every: 32
	warmup_updates: 100

	lr: 4.0e-6
	optimizer_type: Adafactor
	adaptive_loss_weight: True
	stochastic_rounding: True

	image_size: 768
	multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16]
	shift: 4

	checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/
	output_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/
	webdataset_path: file:/mnt/DataSSD/AI/anime_image_dataset/best/newest_best-{0000..0001}.tar

	effnet_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors
	stage_a_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/stage_a.safetensors
	generator_checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/stage_b-generator-049152.safetensors
	```


	## Limitations and Bias

	### Bias

	- This model is intended for anime illustrations.
	Realistic capabilites are not tested at all.

	### Limitations
	- Far shot eyes are bad thanks to the heavy latent compression.