Now a days creating viral AI social media content is not that tough task but creating stuff with creativity is always been headache for AI creators. Here, we are presenting the Video-To-Video workflow for dancing objects that will go viral instantly using Stable Diffusion 1.5 and AnimateDiff. All the explanation will be in detail and easy to understand.
Make sure you have the basic understanding in working with Stable Diffusion 1.5 and AnimateDiff.
Table of contents:
Installation
1. First, you need to install ComfyUI on your machine and learn the ComfyUI basic.
2. Next is to download the workflow (Video2Video) from the CivitAI. Then, drag and drop into ComfyUI.
3. Here, you will get a bunch of nodes in red colored error. Navigate to Manager and click "Update All". Then, again from the Manager click on "Install missing custom nodes". Just select all and install each of them.
4. Restart and refresh ComfyUI to take effect.
Downloading the models
Workflow Explanation
1. To work with the respective workflow you need to create a video mask in white foreground and black background. Now, to do this we have multiple ways. Using image to apha masking node in ComfyUI, but it do not generation consistent result.
To fix this problem, we have depth crafter model that generates better estimated consistent depth map video. You can quickly follow our depth crafter tutorial. For better output, use video with clear background for masking generation.
2. Video Upload- After creating black and white masked video, just load it to "Video upload" node.
As this is tedious process, and you do not get the perfect output at one go. So, trying multiple times will give you better direction. So, you can play with the few options:
Frame load cap value- means how many number of frames your target video you want for processing.(0 means each frames will going to render) Lower give you faster generation time but with shorter clip and vice versa. Its recommended to use low value for trial and error.
Skip first frame- use to skip how many number of frames from the starting value, you want from generation
Skip every nth frames- this is to skip every next frame. setting is to Value=2, means it will render your half video and 1 means render total frames.
3. Load Bg image- Drop your background image to node. This will give your video a proper background effect to look more natural. For Instance- lets say you want to make cactus dance, then use the relevant background like cactus in desert.
4. Load Image - Next, add your object image(Octopus, rock, water, cactus, starfish, etc) to "Load image" node. The limit is endless. It simply depends on your creativity and experience.
5. LoRA Stacker - This node is used for setting up the respective LoRA model you want to use. Their are plenty of LoRA models(ToonYou, DreamShaper etc) you can find on CivitAI if you haven't checked yet.
6. IP Adapter Unified Loader- You can set higher-lower preset for strength to get in the video result.
7. IP Adapter Advanced - This controls how much you want to add weight to your object. The more higher value means more dense and lower means lesser.
8. Load AnimateDiff LoRA - Select your AnimateDiff LoRA model to it.
9. Motion Scale- Adss the amount of motion to your object inside generated video. Generally use the value from 0.5-1 to get your work done efficiently.
10. Load AnimateDiff Model - select your AnimateDiff model. Make sure you use the model trained on Stable Diffusion 1.5 only.
11. Efficient loader - Select your checkpoint for Stable Diffusion 1.5. There are examples like DreamshaperLCM, ToonYou, Cyberrealism, Juggernaut etc. You can choose any relevant style needed for your video. Again there are plenty of options you can get from Hugging Face, CivitAI, Github etc.
Then select the relevant pruned VAE( Variational Auto Encoder) model for SD1.5.
Inside the prompt box use positive and negative prompt. Its to make sure that you should use more relative detailed prompt. The more detailed prompt the better your output video get influenced with.
You can master the prompting techniques if you do not know how to do it. Apart from this, negative prompt techniques also adds more detailing to it. More information can be found from our negative prompts tutorial.
KSampler Efficient- This is responsible to generate video animation preview in real-time.
Step= 10 (Higher is good but also slower your rendering time),
CFG=1.5 (influenced prompts to your output),
Sampler=LCM,
Scheduler=SGM_uniform (others are experimental),
Denoise=1
12. Seed- Set this to randomized.
13. Controlnet- Into this group, to the load advanced controlnet model node select Sd1.5 QRcode monster checkpoint.
Controlnet stacker - strength val= 0.5, start percent= 0, end percent= 0.6
Load advanced controlnet model node- load SD1.5 lineart checkpoint. This uses to figuring out the frames in real time.
Another ControlNEt stacker- strength val=1, start percent= 0, end percent=0.75
Realsitic line art-To test at the initial stage, set the resolution to minimum, that is 512.
These settings are experimental and using multiple times gives you the clear understandings.
14. Preview (video Combine)- It uses all the generated video frames to generate the resulted video. Also set the frame rate to same as your video you actually inputted to it.
Finally, click to the "Queue" button to start the rendering process. Generally, this takes couple of minutes. The higher your VRAM your machine have the more faster your generation will be.
15. Upscale node- The video output you get will be somewhat low. So, you can use any video upscaling techniques to upscale it but will also time consuming or there are multiple alternatives you can try using third party applications.
Set the scale value to 2.5 if you want the video to 720 pixels. The most crucial value is the denoise value Adding more will make your result inappropriate. Setting it to lower range (0.2-0.5) is quite better approach.
Tip: The upscaling pipeline also included but if you do not want then so simply drag the mouse cursor to mute them and can be unmuted (using Ctrl+M key).
To get more overview and learnings, you can dive more into Video-To-Video tutorial using AnimateDiff and SDXL.