Mochi 1, an open-source text-to-video diffusion model has been released by Genmo. Trained with 10 billion parameters built on novel Asymmetric Diffusion Transformer (AsymmDiT) architecture that is also flexible to fine tune. The model is capable of generating output with high fidelity and strong prompt adherence.
The model is registered under Apache2.0 license, that means it can be used for research, educational and commercial purposes.
Introducing Mochi 1 preview. A new SOTA in open-source video generation. Apache 2.0.
— Genmo (@genmoai) October 22, 2024
magnet:?xt=urn:btih:441da1af7a16bcaa4f556964f8028d7113d21cbb&dn=weights&tr=udp://tracker.opentrackr.org:1337/announce pic.twitter.com/YzmLQ9g103
Currently, it needs minimum 4 H100 GPU which is really huge for any individual to run, but they also inviting the community to release quantized model so that it easily accessible by the lower end users.
It can be run in ComfyUI, as it consume about 20GB VRAM in the VAE(Variational Auto Encoder) decoding level.
Installation
1. Install ComfyUI into machine.
2. Navigate to "ComfyUI/custom_nodes" folder. Open command prompt using "cmd". Clone the repository by typing following command:
git clone https://github.com/kijai/ComfyUI-MochiWrapper.git
All the respective models gets auto downloaded from Kijai's Hugging Face when you initiate the Workflow for the first time.
If you are interested to work with the raw model, then you can directly access it from Genmo's Hugging face.
Take it into consideration, the model weights is quite huge in size. So, be patient while its getting downloaded. You can track the real-time status in terminal for ComfyUI running in the background.
All the respective models get saved to "ComfyUI/models/diffusion_models/mochi" folder and VAE to "ComfyUI/models/vae/mochi" folder.
Workflow
1. You can get the Workflow from your "ComfyUI-MochiWrapper/examples" folder.
2. Just drag and drop to ComfyUI.
3. Put your positive detailed prompt for better result.