HunyuanVideo: High Quality Video Generation is here

There are many closed source models, but HunyuanVideo (ImageToVideo and TextToVideo) is an open source video generation that is proved to be quite competitive in the market. Tencent released this video generative model trained with over 13 billion parameters promises to become the largest among varieties of models. You can find more in-depth understanding from their research paper.

💀HunyuanVideo 💀

Tancent's Uncensored T2V model is here !!! 😱

Project page:https://t.co/HP9PPvshzQ

Now available in Comfy 👍

Kijai's setup is here: 👇https://t.co/noPNgZV4Ey pic.twitter.com/oivOReZSyN
— Stable Diffusion Tutorials (@SD_Tutorial) December 4, 2024

The model produces professional looking videos that is capable to avoid repetitive movements, ensuring to experience the realistic motion. The video generation accurately matches textual prompting that delivers consistent outputs without glitches.

Thinking about censorship? Well, unlike other diffusion based video generative models it is also capable of generating content smoothly without any restrictions. apart from this, you can also train HunyuanVideo LoRA model using diffusion-pipe framework.

Hunyuan Video Model performance comparision

Source-HunyuanVideo's Hugging Face repository

The researchers claimed that the HunyuanVideo outperformed the Luma1.6, Runway Gen3 and other Chinese video generation models. Lets see and test how this performs in ComfyUI.

Table of Contents:

TYPE A: Native ComfyUI Support

Now, their is an official ComfyUI support available for HunyuanVideo. But, you can only run if you have at least 24GB VRAM other wise you will stuck into out-of-memory error. Lower VRAM users can opt for the second( By Kijai) and third(GGUF)quantized variant listed below.

Installation

0. Install ComfyUI if you are a new user. Old users need to Update ComfyUI from the Manager section by selecting "Update ComfyUI" option.

1. Download Hunyuan ImageToVideo (hunyuan_video_image_to_video_720p_bf16.safetensors) V1/V2 or Hunyuan TextToVideo (hunyuan_video_t2v_720p_bf16.safetensors) from Hugging face repository.

Save it into your "ComfyUI/models/diffusion_models" folder. Create new folder "diffusion_models", if you do not have.

2. Download VAE model from Hugging face and save it inside "ComfyUI/models/vae" folder.

3. Next, download the clip models (clip_l.safetensors and llava_llama3_fp8_scaled.safetensors) from Hugging face and put them into "ComfyUI/models/text_encoders" folder.

4. Also download Llava llama 3 vision and save it into your "ComfyUI/models/clip_vision" folder. This is required for ImgToVideo model to work.

4. Restart and refresh ComfyUI.

Workflow

1. Get the workflow from our Hugging Face Repository.

2. Drag and drop into ComfyUI.

TYPE B: Quantized variant By Kijai

Installing process

1. Install ComfyUI if you haven't done yet.

2. Older user need to choose "Update ComfyUI" option from the Manager to avoid any future errors.

3. Move into "ComfyUI/custom_nodes" folder and open command prompt by typing "cmd" on folder address bar.

Clone the ComfyUI HunyuanVideoWrapper Kijai's repository using following command:

~~git clone https://github.com/kijai/ComfyUI-HunyuanVideoWrapper.git~~

Another option is to use ComfyUI manager then select "Install from git url" option and paste the Git url provided above without "git clone".

Download HunyuanVideo (ImageToVideo / TextToVideo)

4. Download any one of the respective quantized HunyuanVideo(ImageToVideo or TextToVideo) model from Hugging face repository. Choose as per your system requirements and use cases.

There are multiple options to choose from. Select the one which suits your machine requirements. Here, Fp8 variant is for 12 GB and lower VRAMs, and BF16 for higher. You can also use the FastVideo quantized variant and Lora that will improve the video rendering time up to 8x.

After downloading the models, save it into your "ComfyUI/models/diffusion_models" folder.

Also download the required VAE(hunyuan_video_vae_fp32.safetensors for lower end GPUs and hunyuan_video_vae_bf16.safetensors for higher end GPUs) and save it into "ComfyUI/models/vae" folder.

5. Next is to download all text encoders and its files shown above from respective Hugging Face repository and place them into your "ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer" folder.

6. Now, download OpenAI's transformer based Clip Vit Large Model and files(illustrated above) from their Hugging face repository and put them into your "ComfyUI/models/clip/clip-vit-large-patch14" directory.

Optional(for windows users): Another point we want to mention is that you can install Triton and Sage-Attention which will significantly drop the video rendering time to almost 25% as reported by the community.

Make sure you have the CUDA12.x, Visual studio, Vcredist installed on your system. Their are many confusions in setup the Triton into windows. You can get the detailed understanding from Triton-windows github repository.

Install Windows Trition .whl file for your python version. To check python version run "python --version" (without quotes) in command prompt. We have python 3.10 version installed. For other python version checkout Windows Trition release section.

Here, latest triton 3.1.0 is for pytorch 2.4 and higher, but older pytorch will have to use the triton 3.0.1 version.

For normal ComfyUI user this is the syntax. Replace the <<your-trition-python-version>> with your relevant .whl file.

Syntax:

~~pip install <<your-trition-python-version>>~~

~~pip install triton-3.1.0-cp310-cp310-win_amd64.whl~~

Then install sage-attention:

~~pip install sageattention~~

For Comfy Portable users, move inside "ComfyUI_windows_portable" folder and open command prompt and use the following syntax. Replace the <<your-trition-python-version>> with your relevant .whl file.

Syntax:

~~.\python_embeded\python.exe -m pip install <<your-trition-python-version>>~~

~~.\python_embeded\python.exe -m pip install triton-3.1.0-cp310-cp310-win_amd64.whl~~

Then install sage-attention:

~~.\python_embeded\python.exe -m pip install sageattention~~

7. Restart and refresh your ComfyUI to take effect.

Workflow

1. Get the workflow from your "ComfyUI/custom_nodes/ComfyUI-HunyuanVideoWrapper/examples" folder.

(a) hyvideo_t2v_example_01.json (Text to Video workflow)

(b) hyvideo_v2v_example_01.json (Video to Video workflow)

(d) hyvideo_i2v_example_01.json (Image To Video Generation)

2. Just directly drag and drop to ComfyUI. These are the settings you need to do as shown below.

(a) Load Hunyuan Video model

(b) Set the video settings for video generation.

(d) Load the VAE with relevant precision type.

(e) Add positive prompt into Clip prompt box and hit "Run" button to start generating.

TYPE C: GGUF variant by city96

Installation

1. Update ComfyUI from the Manager by selecting "Update ComfyUI" button.

2. Install GGUF custom nodes from the Manager by selecting "Custom nodes manager". Then search for "ComfyUI-GGUF" by City 96 (author) and hit install. If you already used the Flux GGUF or Stable Diffusion 3.5 GGUF variant then you only need to update this custom node.

3. Download the HunyuanVideo(TxtToVideo) or HunyuanVideo(ImgToVideo) model from City 96's Hugging Face repository and put it into "ComfyUI/models/unet" folder.

Here, you will have various model types from Q3bit(very light weight with lower quality gen) to Q8bit (heavy weight with precision output). Choose as per your system VRAM and requirements.

4. Now, get the same Kijai's VAE model from his repository and save it to your "ComfyUI/models/vae" folder.

Workflow

1. Download the same workflow of Comfyui's repository from Type A's workflow section.

2. All will be same here. Just replace the "Load Diffusion Model" node with "UNet Loader (GGUF)" node. Connect it to "Model Sampling SD3" node and "Basic scheduler" node.

3. Now you are ready to go with it. Add positive prompt and hit "Run" button to start generation.

Conclusion

We are using NVIDIA based RTX 3090 24GB VRAM and each video rendering time was around 5-6 minutes. The video frame quality is better in comparison to other video generation models but the time consumption is so lengthy.

Higher resolution takes considerable longer time in video rendering. You can set the video resolution to almost 320x320 and use the upscaling technique inside the workflow.

If we compare with any other diffusion based video generation model, HunyuanVideo is really impressive with subsequent level of consistency.

HunyuanVideo: High Quality Video Generation is here

TYPE A: Native ComfyUI Support

Installation

Workflow

TYPE B: Quantized variant By Kijai

Installing process

Workflow

TYPE C: GGUF variant by city96

Installation

Workflow

Conclusion

Posted by Admin

Search This Blog

Trending

Wan 2.1: Install & Generate Videos locally with lower VRAM

Train your WAN2.1 Lora model on Windows/Linux

Wan2.1 FusionX 14B: Consistent Fast VideoGen with Low VRAM

Easy Install ComfyUI Portable (Windows/Mac/Linux)

Installing Stable Diffusion 3.5 Locally

Run Stable Diffusion 10x faster on AMD GPUs

Our Social Pages

Recent Posts

Important pages

Contact form

HunyuanVideo: High Quality Video Generation is here

TYPE A: Native ComfyUI Support

Installation

Workflow

TYPE B: Quantized variant By Kijai

Installing process

Workflow

TYPE C: GGUF variant by city96

Installation

Workflow

Conclusion

Posted by Admin

Related Posts

Search This Blog

Trending

Wan 2.1: Install & Generate Videos locally with lower VRAM

Train your WAN2.1 Lora model on Windows/Linux

Wan2.1 FusionX 14B: Consistent Fast VideoGen with Low VRAM

Easy Install ComfyUI Portable (Windows/Mac/Linux)

Installing Stable Diffusion 3.5 Locally

Run Stable Diffusion 10x faster on AMD GPUs

Our Social Community

Our Social Pages

Recent Posts

Important pages

Contact form