Generate Lip Sync Video with Wav2Lip

create lip synced video in automatic1111/comfyui

Lip sync is one of the tedious tasks for video generation models. Various AI models are there for video generation but lip sync has always been a challenging task for any AI model. But, now we have a technique called Wav2Lip used for lip-syncing. This uses your audio as input with a target video and you will get your lip-synced video in a few minutes.



The interesting part is that it can be used as an Automatic1111 extension and ComfyUI node. Let's see how to install Wav2Lip and create a video with lip sync.


Installation in Automatic1111/ForgeUI:

1. First, you need to install the updated version of Automatic1111 or ForgeUI.

choose your operating system

download ffmpeg

2. Download ffmpeg from its official website. Make sure to choose the selection according to your Operating System. In our case, its Windows OS. For extracting, you can use 7zip /WinRar software. Search for an exact ffmpeg file that can be found inside the "bin" folder.


search environment variables


set environment variables

 Now, search for "environment variables" from your taskbar and just set its path into environment variables by copying the downloaded file's path location present in bin folder.


enable python and c++

3. Download Visual Studio and while installing you should enable the "Python" and "C++ package" options.

4. Because the extension is unavailable in Automatic1111, we have to install it from the URL section. Copy the URL of the Wav2Lip repository provided below. 

https://github.com/numz/sd-wav2lip-uhq

Open Automatic1111/ForgeUI. Move to the "Extension" tab, select the "Install from URL" option, and paste the copied URL.

installing wav2lip extension using git

Then click on the "Install" button to start the process. This will take some time to download the perquisites and you can see the background process in the command line terminal.

5. After installation, navigate to the "Installed" tab, then restart your Automatic1111 by hitting the "Apply and RestartUI" button to take effect.

6. Now, you will get a new tab added "Wav2Lip UHQ" inside your Stable Diffusion WebUI.

7. Next is to download these respective models and rename the downloaded files as mentioned below (especially for the s3fd model to as "s3fd.pth"):


Model Download Installation folder path
Wav2Lip
 (Highly accurate )
Link extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
Wav2Lip + GAN
(Slightly inferior, but better visual quality)
Link extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
s3fd
(Face Detection pre trained model)
Link 

extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth
landmark predicator
 (Dlib 68 point face landmark prediction)
Link

Alternative Link1

Alternative Link2
extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
face swap model (model used by face swap) Link

extensions\sd-wav2lip-uhq\scripts\faceswap\model\inswapper_128.onnx

8. Enjoy, the face swapping with lipsync AI video generation.


Installation in ComfyUI:

1. First, you should have the ComfyUI installed on your machine.

choose your operating system

download ffmpeg

2. Download ffmpeg for managing and creating videos and set up its environment variables. For zipped file extraction, you can use 7zip or WinRar. Search for an exact file that can be found inside the "bin" folder.


set environment variables

3. To set it, search for "environment variables" from your taskbar and just set its path into environment variables by copying the downloaded ffmpeg file's path location. 


move to custom nodes folder

4. Move to ComfyUI's custom nodes folder  and open the command prompt by typing "cmd" on the address bar and clone the repository by typing the command below:

git clone https://github.com/ShmuelRonen/ComfyUI_wav2lip

5. Now, again use these commands to install the required dependencies that will be going to download the necessary libraries:

pip install -r requirements.txt



download wav2lip model from hugging face

6. Download the Wav2Lip_gan model file from the Hugging Face repository.

7. Save the downloaded model into the "\custom_nodes\ComfyUI_wav2lip\Wav2Lip\checkpoint" folder.

8. Then, just restart your ComfyUI to take effect.

download lipsync workflow

9. Now for getting the basic workflow, it's available inside the "example" folder of the cloned repository or just download from the official repository.

Simply select, drag, and drop into ComfyUI.


lip sync workflow installed

10. After opening the workflow, you will get some missing nodes in red colored. Install these missing nodes from the ComfyUI manager by selecting the "Install missing custom nodes" option.


install missing custom nodes

11. Then select the install button to install the nodes listed. Once installed, restart your ComfyUI.

install codeformers

12. At last, you need to install the "Codeformer" library for extra support by selecting the "Install models" from the ComfyUI manager.

Now, simply load your audio and target video, and within a few minutes, you will get your lip-synced video.