Cool AI Videos using AnimateDiff and SDXL

AnimateDiff with SDXL workflow

This the just the workflow with Stable Diffusion XL(SDXL) models, but you can choose any SDXL fined tune models check points as per requirements to make your workflow better. To work with the workflow, you should use NVIDIA GPU with minimum 12GB (more is best).

Table of Contents:


Installation Process:

1. This workflow is only dependent on ComfyUI, so you need to install this WebUI into your machine.

2. Update your ComfyUI using ComfyUI Manager by selecting "Update All". Next, you need to have  AnimateDiff installed. Using ComfyUI Manager search for "AnimateDiff Evolved" node, and make sure the author is Kosinkadink. Just click on "Install" button.


download the ipadapter with animatediff workflow

Download the "IP adapter batch unfold for SDXL" workflow from CivitAI article by Inner Reflections. Just directly drag and drop into ComfyUI. The is the basic ComfyUI workflow.


install missing nodes

3. Now, if you are opening this workflow for the first time you will get a bunch of missing nodes error. 


Comfyui manager installing missing nodes

4. Simply download all of them one by one using the ComfyUI Manager. To do this, open ComfyUI manager from "Manager" tab and into the manager and select "Install Missing custom nodes".


installing all nodes from list

Then, just install all the custom nodes one by one from the list by clicking on the "Install" button.

5. Now, just restart ComfyUI by hitting the "Restart" button.


download model checkpoint

6. Next is to download the model checkpoints necessary for this workflow. There are tones of them avaialble in CivitAI. For illustration, we are downloading ProtoVision XL

You can choose whatever model you want but make sure the model has been trained on Stable Diffusion XL(SDXL). 

As usual, save it inside "ComfyUI\models\checkpoints" folder. Its to take in mind that your output will be depend on the model you use as checkpoints.


download sdxl Vae

7. Here, we are using SDXL fine-tuned model, so we will also need to use sdxl vibrational auto encoders as shown above "sdxl_vae.safetensors". Download it from the Hugging Face repository. After downloading, place it inside "ComfyUI\models\vae" folder. 


install IP adpaterV2

8. Next you need to download IP Adapter Plus model (Version 2). Here, we need "ip-adapter-plus_sdxl_vit-h.safetensors" model for SDXL checkpoints listed under model name column as shown above. 

After download, just put it into "ComfyUI\models\ipadapter" folder. Those users who have already upgraded their IP Adapter to V2(Plus), then its not required.


install sdxl image encoder

9. Download the image encoder for SDXL from Hugging Face repository. Rename the downloaded file as "image_encoder.safetensors" and save it inside "ComfyUI\models\clip_vision" folder.


download controlnet models

10. Download the Text2Image control net model from the TencentArc's Hugging face repository. You will have two version of same model version i.e. "diffusion_pytorch_model.fp16.safetensors" (for faster rendering with low quality)and "diffusion_pytorch_model.safetensors" (for higher quality with slow rendering speed). 

So, just download both of them and put them inside "ComfyUI\models\controlnet" folder.


install hsxl models

11. Last, you also need to download both the models "hsxl_temporal_layers.f16.safetensors" and "hsxl_temporal_layers.safetensors" from the Hotshotco's Hugging face repository. Save them inside "ComfyUI\custom_nodes\ComfyUIAnimatedDiff-Evolved\models" folder.

12. Now, just restart your ComfyUI to take effect.


Workflow Explanation:

The workflow is so simple and we have breakdown all that you need to know in best possible way. Lets directly deep dive into it.

Load video path

1. First you have to load your reference video clips which will be around 10-15 seconds. You should always use the shorter clips because its a time consume process and you can't get the demanded results at the first go. Add the location path of your reference video in "Load video" node with removing the inverted comma.


set video dimensions

2. Set your same dimension as your reference video. You can also scale down these value to fast the video rendering time.


load checkpoint

3. Load your model into the "Load Checkpoint" node. After downloading the model checkpoint, if you don't get that into the list the simply select "Refresh" from the ComfyUI Manager. Then load the SDXL VAE (Variational Auto Encoder) model.


Load IP adapter model

4. Next is to load the IP Adpater plus model and the image encoder model.


set IP adapter settings

5. Now, configure the IP adapter settings from the "Apply IP Adapter" node. You can play around with the weight from 0.20 to 0.80 and for noise its from 0.2 to 0.4 will definitely give you a marginal effect in your output. But, make sure not to go too far from the required settings.


Load control net models

6. Load the control net models. Here, we have two option-
(a) diffusion_pytorch_model.fp16.safetensors - Provides you faster video generation but results in low quality.
(b) diffusion_pytorch_model.safetensors - Takes higher GPU for rendering which will be slower with high quality output.
People having VRAM lesser than 12GB should choose the "fp16" version and other can select the latter one. 


set controlnet settings

Then, set the ControlNet setting. Usual strength value with 1.0 you should use to influence the ControlNet effect.


Add AnimateDiff model

7. Next from the "AnimateDiff Loader" Node, load the HotShotXL model from the dropdown list.


KSampler settings

8. Now, comes the "KSampler" node where you need to play with the consistency, quality, and prompt influence to get better output. Setting with specified ranges are listed below you can with:
-Steps: 25-30
-Control After Generate: randomize
-CFG: Ranges from 3-8 
-Sampler: Euler or DPMPP_3m_GPU
-Scheduler: Karras
-Start_Step: 5-13

Rest leave as it is.


set positive prompts

set negative prompts

9. Now put your positive and negative prompts into "CLIPTextEncodeSDXL" green and red colored node. There is two box into both the boxes. So, simply putting same prompt is the best way to work with. 
Now, you can learn more about how to do prompting for Stable Diffusion models. Apart from that, if want to explore more prompt ideas then you can also try our Stable Diffusion Prompt Generator.


configure video settings

10. Next, we have the "Video Combine" node where you need to set the frame rate which should be same as the original video.


Upscaling group

11. At last, there is a Upscaling group which comprises of various nodes like- Upscale Image, VAE Encoder, KSampler Advanced, VAE Decoder, Save Image and Video Combine Nodes.
Here, set all the setting as it is. 


configure video output settings

Into the "Video Combine" node just change the frame rate as same as your original video. You can rename your generated video, formats(Ex- mp4 etc.).

12. Finally, you can select the "Queue" button to start your video rendering. After video generation, you can access the generated output from "ComfyUI\output" folder.


output realtime preview

The new version of ComfyUI adds metadata into every image or video you generate. So, if you want to load the same workflow you can do that by simply drag and drop into your ComfyUI canvas also helps to work in collaborative team environment.
  
Its to take in mind that you need to play around with the setting for getting the satisfied results with multiple rendering.