How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("neuregex/Bernini-R-fp8", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Bernini-R fp8 (e4m3) — for ComfyUI-BerniniR

fp8 (float8_e4m3fn, weight-only) build of ByteDance/Bernini-R (which is Wan2.2-T2V-A14B inside), self-contained (2 transformers + VAE + UMT5 + tokenizer + scheduler), packaged for the ComfyUI-BerniniR custom node. Runs the full pipeline in 24 GB.

The Linear weights are stored in float8_e4m3fn and upcast to bf16 on every forward; norms / embeddings / time-/text-embedders / patch-embed stay in bf16/fp32. These weights are bit-identical to the node's on-the-fly fp8 quantization — validated end-to-end on an i2i edit: same seed, same GPU → 0 pixel difference. Loading this pre-quantized bundle is also faster (no bf16 → fp8 cast at load).

VRAM (measured: torch.cuda.max_memory_allocated, NVIDIA A10 24 GB, fp8 + sequential offload)

Task Frames / resolution Peak VRAM Fits 24 GB
i2i / t2i (image edit / image) 1 frame, 848×848 ~16.7 GB ✅
t2v / v2v / rv2v (video / video edit) 81 frames (full length), 480p ~18.8 GB ✅

The UMT5 text encoder is freed before the experts load, and offload keeps a single ~14 GB expert resident — that is what makes full-length 480p video fit in 24 GB.

Tasks

t2v · t2i · i2i (image edit) · v2v (video edit) · rv2v (video edit + reference) · r2v (reference-to-video). Edits preserve the source content/motion via Bernini's source-id RoPE (validated qualitatively).

Usage

Install the ComfyUI-BerniniR node, then in BerniniR · Load Model:

  • set source = neuregex/Bernini-R-fp8 (auto) with auto_download = True — downloads ~40 GB to download_dir on first run (with a free-space check and progress bar), or
  • hf download neuregex/Bernini-R-fp8 --local-dir models/bernini/Bernini-R-fp8 and use source = local.

For the full bf16 weights instead, point the node at ByteDance/Bernini-R-Diffusers (source = ... (full bf16)), which needs more VRAM (A100-class) or on-the-fly fp8.

Credits & license

  • Algorithm & model: Bernini: Latent Semantic Planning for Video Diffusion, ByteDance (arXiv:2605.22344 · code) — Apache-2.0.
  • Base: Wan2.2-T2V-A14B.
  • fp8 build by neuregex. Apache-2.0.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for neuregex/Bernini-R-fp8

Finetuned
(42)
this model

Paper for neuregex/Bernini-R-fp8