Installing Stable Video Diffusion (Automatic1111/ComfyUI/Colab)

install stable video diffusion

StabilityAI has entered into the new field of AI video generation while others are limited to ImageGen. The new image-to-video model- Stable Video Diffusion has been released on November 21, 2023. Now it can add a new life to your realistic images. 


Stable video diffusion output by stabilityAI
Image Source: StabilityAI

Third-party applications like Pixeverse, LeonardoAI, or Morph Studio (partner of StabilityAI) have integrated it to work as an online platform but these third-party applications have various kinds of limitations like short video generation, not much control, subscription charges, and even if you pay you do not get the expected results. 

So, to get the overall power of this image-to-video model, we will going to test in local (Automaitc1111/ComfyUI) as well as on cloud (Google Colab).

Make sure you have a minimum 6-8 GB GPU to run this model. Further, in-depth details about the model have been provided in the research paper. For commercial purpose, you have to join the StabilityAI's membership program.

Here, we will see how to work with popular WebUIs, but first, we will do the installation part.


Installing in Automatic1111(Forge):

We want to clear about Stable Video Diffusion is that currently Automatic1111 don't have the support to run it. So, the quick solution is to install or upgrade your older WebUI (if installed Automatic1111 previously) to ForgeUI which is similar to Automatic1111 but with extra functionalities. 

1.  Install Forge UI or upgrade your Automatic1111 to Forge Webui.


download stable video diffusion model

2. After installing/upgrading WebUI, go to the respective link of CivitAI to download the Stable Video Diffusion model provided below.

https://civitai.com/models/207992/stable-video-diffusion-svd

3. Store the downloaded model by navigating inside root folder into the "stable-diffusion-webui\models\svd" folder.

4. Restart Forge WebUI to take effect.


Worflow(Forge UI):

The workflow will be so simple and straight forward that it will be a cake walk for you, but to get the perfect output you need to go deeper into it and do with multiple trials.

drag and drop image in SVD section

1. First is to select your model checkpoint. Then drag and drop the referenced image in the canvas area or just generate an image and click the button to send it to the SVD section. 


select image dimensions

Choose your dimensions in landscape(1024 by 576) or portrait(576 by 1024) style. In the current stage, we have these two options only.


Select SVD tab

2. Select the SVD tab.


adjust settings for Stable Video Diffusion

3. Remember that the same image dimension to work with this. So, wherever you are generating any image art and working with SVD use the recommended dimensions only.

We are using these settings:

-Width:1024  (dimension in width)

-Height:576  (dimension in height)

-Video Frame:25  (frame per second)

-Motion Bucket Id: 127 ( This controls the motion of the generated video. higher values increases the motion of the Video.)

-FPS:6

-Augmentation Level:0

-Sampling Steps:20 (gives more refined clips but higher takes longer generation)

-CFG Scale: (influence the prompt for generation)

-Sampling Denoise: 1 (denoising effect)

-Guidance Minimum CFG:1 

-Sampler name: Euler (it's experimental, you can choose yours)

-Scheduler: Karras

-Seed: it's experimental (play with multiple random values)

At last hit the "Generate" button. The generation will depend on what GPU and hardware you are using.

reference image
Reference Image

You don't get the perfect result on the first try, so you need to play with the settings for expected results. Currently, it can generate 4 seconds video only. After generation, you can download it by using the download button placed on the top right corner of the video player.

We uploaded our generated result on Instagram profile. You can check the output by moving to the related link provided below.

Now, you will observe that the generated video quality is not up to mark, so for upscaling you can use any online tool or video editor. And yes the other third-party platforms are also using background techniques (using code) to upscale and generate high-quality videos. 


Installing in ComfyUI:

1. First you need to have ComfyUI installed on your machine.

2. Now, download the Stable Video Diffusion Models. There are 2 model released. You can download as per your requirement. But, we have downloaded both.

download svd model

(a) Download Image-to-video (generates with 14frames) -"svd.safetensors " file.

download svd-xt model

(b) Download Image-toVideo-xt (generates with 25frames) - "svd_xt.safetensors" file.

Save them inside "ComfyUI_windows_portable\ComfyUI\models\checkpoints" folder.

3. You also need to have comfyUI manager if you dont have you can follow the installation tutorial.

update custom nodes

4.Install and Update and the custom nodes by navigating to the manager. 

installing nodes

After installing all and updating restart ComfyUI.

Download ffmpeg

5. Now, to work with these models you need install ffmpeg as prequisite files from their official website official website.

installing ffmpeg for windows

Then select the option suitable for your Operating System. In our case its Windows. Now, you will be redirected to new page.

ffmpeg release full

Again select "ffmpeg-release-full.7z"  file to download. After downloading use WinRar/7zip to extract the zipped files. You can save the downloaded file any where. But, for better management we are storing it to ComfyUI root folder only. 



setup environment variables

6. Move inside the bin folder and copy the path. Search "environment variables" from your start menu.

setting environment variables

Then select path>edit>new option, and paste the path location which you have copied. Then just click "OK" and save all settings.

7. Enjoy the video generation.


Install and Run in Colab:

Installing and running Stable Video Diffusion in Google Colab is so simple that you can do this with just copying and pasting the block of code and start video generation. Lets see how can we do this.



create new colab notebook

1. First create a new notebook by opening Google Colab. Make sure you have logged in to your Google account.

change GPU runtime

2. Set your runtime to T4 GPU by navigating to the top menu bar of the Colab dashboard.

connect T4

connect T4

Make sure its been connected to T4 GPU by navigating to the top right corner of the Colab.

create new cell box

Now after connecting, just move your cursor to create a new cell box

copy and paste to install diffusers library for svd

3. Install the Diffusers, transformers library by navigating to official Hugging face page, copy and paste the required command into terminal :

!pip install -q -U diffusers transformers accelerate

Then Click play button to start execution of the first cell available on the left side.


create second cell box

Again, create second cell box by navigating your cursor on code button.

copy and paste to install diffusers

4. Again copying and paste another block of code from the same Hugging Face page to Colab terminal into the second cell. 

block of code for second cell

Your code should look like this like we have shown on the above illustrated image.

set image url and parameters

5. Set your reference image Url link(link should starts with "https" or "http") and other relevant parameters like seed value, fps(frame per second) etc.



search any uploading image site

For image url link, you can search on google for "upload image free". This will help to grab you a image url hosted link that can be pasted in reference image link.

copy image url link

copy image url link

paste image url link

Paste your image url hosted link in reference image place link.


Here, we have already discussed not to change the image dimension because its the recommended size that Stable diffusion Video supports for generation process.

generated video took 6 minutes

5. Now, click the play button of second cell available on the left side to start execution of the code to generate video. Be patient, and just wait for the video generation process. In our case it took 6 minutes

Downloading generated video

6. Finally, you can download your generated video from the left panel of the Google Colab dashboard. To download just right click on the folder icon then select file and click on "Download" option.

disconnect runtime

disconnect runtime

7. Now, at last you should delete runtime so that your session do not run in the background.


Conclusion:

Stable Video diffusion has changed the overall dynamics in the field of video creation. In the future it will definitely gain significant perfection and generate incredible results. 

Many people are showing their creativity in social media using these models to AI video.