| --- |
| pipeline_tag: text-to-video |
| license: other |
| license_link: LICENSE |
| --- |
| |
| # TrackDiffusion Model Card |
|
|
| <!-- Provide a quick summary of what the model is/does. --> |
| TrackDiffusion is a diffusion model that takes in tracklets as conditions, and generates a video from it. |
|  |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| TrackDiffusion is a novel video generation framework that enables fine-grained control over complex dynamics in video synthesis by conditioning the generation process on object trajectories. |
| This approach allows for precise manipulation of object trajectories and interactions, addressing the challenges of managing appearance, disappearance, scale changes, and ensuring consistency across frames. |
| ## Uses |
|
|
| ### Direct Use |
|
|
| We provide the weights for the entire unet, so you can replace it in diffusers pipeline, for example: |
|
|
| ```python |
| pretrained_model_path = "stabilityai/stable-video-diffusion-img2vid" |
| unet = UNetSpatioTemporalConditionModel.from_pretrained("/path/to/unet", torch_dtype=torch.float16,) |
| pipe = StableVideoDiffusionPipeline.from_pretrained( |
| pretrained_model_path, |
| unet=unet, |
| torch_dtype=torch.float16, |
| variant="fp16", |
| low_cpu_mem_usage=True) |
| ``` |
|
|