StabilityAI has entered into the new field of AI video generation while others are limited to ImageGen. The new image-to-video model- Stable Video Diffusion has been released on November 21, 2023. Now it can add a new life to your realistic images.
Image Source: StabilityAI |
Third-party applications like Pixeverse, LeonardoAI, or Morph Studio (partner of StabilityAI) have integrated it to work as an online platform but these third-party applications have various kinds of limitations like short video generation, not much control, subscription charges, and even if you pay you do not get the expected results.
So, to get the overall power of this image-to-video model, we will going to test in local (Automaitc1111/ComfyUI) as well as on cloud (Google Colab).
Make sure you have a minimum 6-8 GB GPU to run this model. Further, in-depth details about the model have been provided in the research paper. For commercial purpose, you have to join the StabilityAI's membership program.
Here, we will see how to work with popular WebUIs, but first, we will do the installation part.
Table of Contents:
Installing in Automatic1111(Forge):
We want to clear about Stable Video Diffusion is that currently Automatic1111 don't have the support to run it. So, the quick solution is to install or upgrade your older WebUI (if installed Automatic1111 previously) to ForgeUI which is similar to Automatic1111 but with extra functionalities.
1. Install Forge UI or upgrade your Automatic1111 to Forge Webui.
2. After installing/upgrading WebUI, go to the respective link of CivitAI to download the Stable Video Diffusion model provided below.
https://civitai.com/models/207992/stable-video-diffusion-svd
3. Store the downloaded model by navigating inside root folder into the "stable-diffusion-webui\models\svd" folder.
4. Restart Forge WebUI to take effect.
Worflow(Forge UI):
The workflow will be so simple and straight forward that it will be a cake walk for you, but to get the perfect output you need to go deeper into it and do with multiple trials.
1. First is to select your model checkpoint. Then drag and drop the referenced image in the canvas area or just generate an image and click the button to send it to the SVD section.
Choose your dimensions in landscape(1024 by 576) or portrait(576 by 1024) style. In the current stage, we have these two options only.
2. Select the SVD tab.
3. Remember that the same image dimension to work with this. So, wherever you are generating any image art and working with SVD use the recommended dimensions only.
We are using these settings:
- Width:1024 (dimension in width)
- Height:576 (dimension in height)
- Video Frame:25 (frame per second)
- Motion Bucket Id: 127 ( This controls the motion of the generated video. higher values increases the motion of the Video.)
- FPS:6
- Augmentation Level:0
- Sampling Steps:20 (gives more refined clips but higher takes longer generation)
- CFG Scale: (influence the prompt for generation)
- Sampling Denoise: 1 (denoising effect)
- Guidance Minimum CFG:1
- Sampler name: Euler (it's experimental, you can choose yours)
- Scheduler: Karras
- Seed: it's experimental (play with multiple random values)
At last hit the "Generate" button. The generation will depend on what GPU and hardware you are using.
Girl Meme |
You don't get the perfect result on the first try, so you need to play with the settings to get expected results. Currently, it can generate 4 seconds video only with FPS set to 8. After generation, you can download it by using the download button placed on the top right corner of the video player.
For experiment ,we tested with different aspect ratio like 1750 as width with 576 as height and the quality was better but the generated video was static.
We uploaded our generated result on Instagram profile. You can check the output by moving to the related link provided below.
Now, you will observe that the generated video quality is not up to mark, so for upscaling you can use any online tool or video editor. And yes the other third-party platforms are also using background techniques (using code) to upscale and generate high-quality videos.
Installing in ComfyUI:
1. First you need to have ComfyUI installed on your machine.
2. Now, download the Stable Video Diffusion Models. There are 2 model released. You can download as per your requirement. But, we have downloaded both.
(a) Download Image-to-video model (generates with 14frames) -"svd.safetensors " file.
(b) Download Image-toVideo-xt (generates with 25frames) - "svd_xt.safetensors" file.
Save them inside "ComfyUI_windows_portable\ComfyUI\models\checkpoints" folder.
3. You also need to have comfyUI manager if you dont have you can follow the installation tutorial.
4.Select "Install Custom Nodes" to install and update the custom nodes.
After installing all and updating restart ComfyUI.
5. Now, to work with these models you need install ffmpeg as prequisite files from their official website official website.
Then select the option suitable for your Operating System. In our case its Windows. Now, you will be redirected to new page.
Again select "ffmpeg-release-full.7z" file to download. After downloading use WinRar/7zip to extract the zipped files. You can save the downloaded file any where. But, for better management we are storing it to ComfyUI root folder only.
6. Move inside the bin folder and copy the path. Search "environment variables" from your start menu.
Then select path>edit>new option, and paste the path location which you have copied. Then just click "OK" and save all settings.
7. Enjoy the video generation.
Install and Run in Colab:
Installing and running Stable Video Diffusion in Google Colab is so simple that you can do this with just copying and pasting the block of code and start video generation. Lets see how can we do this.
1. First create a new notebook by opening Google Colab. Make sure you have logged in to your Google account.
2. Set your runtime to T4 GPU by navigating to the top menu bar of the Colab dashboard.
Make sure its been connected to T4 GPU by navigating to the top right corner of the Colab.
Now after connecting, just move your cursor to create a new cell box
3. Install the Diffusers, transformers library by navigating to official Hugging face page, copy and paste the required command into terminal :
!pip install -q -U diffusers transformers accelerate
Then Click play button to start execution of the first cell available on the left side.
Again, create second cell box by navigating your cursor on code button.
4. Again copying and paste another block of code from the same Hugging Face page to Colab terminal into the second cell.
Your code should look like this like we have shown on the above illustrated image.
5. Set your reference image Url link(link should starts with "https" or "http") and other relevant parameters like seed value, fps(frame per second) etc.
For image url link, you can search on google for "upload image free". This will help to grab you a image url hosted link that can be pasted in reference image link.
Paste your image url hosted link in reference image place link.
Here, we have already discussed not to change the image dimension because its the recommended size that Stable diffusion Video supports for generation process.
5. Now, click the play button of second cell available on the left side to start execution of the code to generate video. Be patient, and just wait for the video generation process. In our case it took 6 minutes.
6. Finally, you can download your generated video from the left panel of the Google Colab dashboard. To download just right click on the folder icon then select file and click on "Download" option.
7. Now, at last you should delete runtime so that your session do not run in the background.