Train HunyuanVideo LoRA on local PC

train hunyuanVideo on windows pc

If you are wondering, you need to pay a lot to those third party companies to generate your own AI videos style. Think again, because open-source LoRA for HunyuanVideo project is here. To do the HunyuanVideo LoRA training, we are using the Diffusion Pipe framework. Using this method, you can also train LoRA for Flux, and LTX-Video as well.

Basically, this method only supports Linux but if you are a Windows user, you need to install some extra dependencies.

So, its officially recommended to have at least dedicated 24GB VRAM with 64RAM or higher. Its also better to get the basic understanding of HunyuanVideo model if you are unaware to it. We are using RTX 4090 with 24GB VRAM. People do not having the higher end GPUs can rent these online in almost 2$/hr. 

You need to fairly tweak these epochs configurations according to on your datasets and system requirements. To do our LoRA training we used almost 1500-2500 steps.

Table of Contents:


Setup Process

1. If you are windows user, then first do the WSL(Windows subsystem for Linux) version 2 Setup. This will enable you to use the Linux environment for the Deepspeed that do not support well with windows. To do this follow the process:

Right click on windows icon and select "Search" option and type for "windows feature". Click on "Turn Windows feature on or off". Now, enable "Windows for subsystem for Linux" and "Virtual machine Platform" options. 

Now restart PC. 

Alternative: 
Open Windows Terminal/Command prompt(Admin).

wsl --install

If this do not work, this means you have on older windows version. Only windows 10/11 are currently working. 

After the installation is complete. Just restart your PC to take effect.

2. When installing Ubuntu, you will be prompted to create a username and password.
Save these credentials securely anywhere so that you will get them during the setup process.

3. Update the Ubuntu System.  Once logged in, start by updating your package lists and performing a full system upgrade.

sudo apt update

sudo apt full-upgrade -y


4. Install essential tools and libraries required for this setup.

sudo apt-get install -y git-lfs wget python3-dev build-essential

Check if your NVIDIA drivers are installed and working correctly:

nvidia-smi

You should see details about your GPU(s) and driver version. If this command fails, just download and install the NVIDIA CUDA driver for WSL: NVIDIA CUDA for WSL

5. Now, you have to install Miniconda. Miniconda simplifies Python environment management. Install it using the following commands:

Make new directory

mkdir -p ~/miniconda3

Download Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh

bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

Delete the miniconda setup file after downloading to make things clean.

rm ~/miniconda3/miniconda.sh

Activating environment

source ~/miniconda3/bin/activate

conda init --all



download and install miniconda

If this gets any error, navigate to Miniconda official page and click to "Linux" tab and use the same installation command to install as described in Step5.



6. Create and activate a Conda virtual environment for the project and press Y to initiate the installation:

conda create -n diffusion-pipe python=3.12

Activate the environment. Now whenever you want to initiate the training just run this following command:

conda activate diffusion-pipe


7. Install PyTorch and CUDA-NVCC. Ensure compatibility between PyTorch and CUDA versions. For the latest compatibility, refer to the Anaconda CUDA-NVCC page.

pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121


conda install nvidia::cuda-nvcc


8. Next, you have to clone the diffusion-pipe repository:
git clone --recurse-submodules https://github.com/tdrussell/diffusion-pipe

Move inside diffusion-pipe folder.
cd diffusion-pipe


9. At last, install the remaining dependencies:

pip install -r requirements.txt


Tip: If you encounter with issues while installing DeepSpeed or other packages, install the NVIDIA CUDA toolkit using following command:

sudo apt-get install -y nvidia-cuda-toolkit


Downloading the models

Working with WSL (Windows Subsystem for Linux) makes it easy to access your files from Windows. Here's how to get started.

You can view and manage your WSL files directly in Windows File Explorer. Simply navigate to:

\\wsl$\Ubuntu\home\your-user-name\diffusion-pipe\

Here, just replace "your-user-name" with the username you created during the WSL setup.

This directory lets you seamlessly transfer images to your training folder and also copy finished LoRA models to ComfyUI or other tools. 

If you already have HunyuanVideo model files, simply copy them from Windows into the appropriate folders(as shown folder path above) within your WSL environment. For more understanding, we already explained different HunyuanVideo workflows.

This ensures you can quickly set up models for training and testing, while maintaining easy access from both Windows and Linux environments.


Alternative (Using Command line environment):

If you haven't downloaded yet, use these commands to directly download into your Windows subsystem.

Move inside diffusion-pipe folder
cd ~/diffusion-pipe

Create new folder "hunyuan", "clip" and "llm" inside models folder.
mkdir -p models/{hunyuan,clip,llm}


Download HunyuanVideo model file from Kijai's repository that gets saved into "hunyuan" folder:

wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors -P ~/diffusion-pipe/models/hunyuan/

Download VAE model files from Kijai's repository that gets saved into "hunyuan" folder:
wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_vae_bf16.safetensors -P ~/diffusion-pipe/models/hunyuan/

Download the CLIP model from Open AI's Hugging Face repository and gets saved into "clip" folder:

git clone https://huggingface.co/openai/clip-vit-large-patch14 models/clip

Now, download text encoder model(LLAVA LLM) which gets saved into "llm" folder.

git clone https://huggingface.co/Kijai/llava-llama-3-8b-text-encoder-tokenizer models/llm


Prepare and Configure your dataset


1. Move inside "diffusion-pipe" folder. Create any new folder, for ex- "data". Move inside it and create "input" folder. Inside "input" folder store all your dataset and follow the same as we usually do in other LoRA training.


2. You can use only images or images with video (both) in diverse camera angles with captions (txt format) to prepare your dataset. You images/videos should be high quality minimum 1024 resolution for images and 720p for videos.

Folder structure, all the image, video files and its relevant caption files should be in the following format:
Inside "dataset/input" folder.

├── image1.png
├── image1.txt  
├── image2.png
├── image2.txt


prepare dataset and captioning

This is optional, but better if you want more control to your end result.

Caption files should be .txt files with the same name as the corresponding image/video file. For example- "image1.png" should have the text description as file "image1.txt" in the same directory. If the files doesn't have a matching caption file, a warning will be printed, but training will be initiated without any captioning.

Note: We are showing this for educational and research purpose only. Do not use any real persons dataset for LoRA training without their consent.

Now, as per your needs edit two config files that will be found inside your "diffusion-pipe/examples" folder as  "hunyuan_video.toml" and "dataset.toml". You can use any file editor to add the configurations. Here, we are using Vs code. All the relevant details can be found inside config files.

set dataset path location

(a) dataset.toml- This file is used to add your dataset configuration. You will find the default dataset path into "dataset.toml" configuration file . Just replace that with your dataset path as shown above.

The default was "/home/anon/data/images/grayscale". We replaced it with our path "home/SDT/diffusion-pipe/data/input". Take care of forward/backward slash while coping and pasting the path location.

Set the resolution. We left it default to 512.

(b) hunyuan_video.toml- Inside this set the configurations for your LoRA to be generated.


Set lora model output path location

Set the LoRA output file directory path as shown above. You can also leave it default and it will create it automatically.

Set epochs=100 
This means how many times your LoRA model will get trained on your dataset. Higher the number, longer you training will be and more refined output you will get but also takes more disk space.

set downloaded models path location

Now, setup and replace the HunyuanVideo model, VAE, clip and text encoders path locations from where ever you saved as shown above.

We have downloaded models using command line. So, in our case these are the path locations:

transformer_path =  'models/hunyuan/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors'
vae_path =  'models/hunyuan/hunyuan_video_vae_bf16.safetensors'
llm_path =  'models/llm/llava-llama-3-8b-text-encoder-tokenizer'
clip_path =  'models/clip/clip-vit-large-patch14'

rank=32
Set LoRA rank for more higher detailing.

lr = 2e-5 
Learning rate (default). Higher will avoid the patterns but lower will over-fit and lead to bad results. 


Training Process

Now, time to initiate the training process with the following command below:

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config examples/hunyuan_video.toml

At the initial stage of training it takes time as this uses caching techniques in the background. But after that it will not.

You can to stop the training in the middle by hitting shortcut key "Ctrl+C". Your LoRA model gets saved into the default folder as mentioned(output_dir) into the "hunyuan_video.toml" file.

Test your LoRA

You can test your trained HunyuanVideo lora on ComfyUI. To do this, get the LoRA files from output folder. Just copy and rename the "adapter_model.safetensors" (something relative name) and save it into your ComfyUI's "model/loras" folder.

Make sure you have the HunyuanVideoWrapper node installed described on Kijai's HunyuanVideo workflow.

Use the "HunyuanVideo Lora Select" node to load it

Finally, you have your own LoRA model. Now, this is time consuming process. You do not get the best at the first go. Experimenting with different data-sets, epochs, learning rates will get you closer to the better number for your specific dataset.