NVIDIA's Cosmos Video Generation is really Perfect ?

The Cosmos diffusion models released by Nvidia team is capable of generating dynamic, high-quality videos from text, images, or even other videos.

These pre-trained models are like generalists. They have been trained on massive video datasets that cover a wide range of real-world physical scenarios. This makes them incredibly versatile for tasks that require an understanding of physics.

These models released under NVIDIA open license that gives you the freedom to work for commercial purpose when working with their limitations. To get the deeper insights, access the related information from their research paper.

Installation

1. First get install ComfyUI if you are new to it.

2. Old user need to Update ComfyUI from the Manager section.

3. Now, download the Nvidia Cosmos models from Hugging Face repository and save these into your "ComfyUI/models/diffusion_models" folder. Make sure to use the correct model variant. The 7Billion variant is for lower end and 14Billion is for higher end GPUs.

We observed many people is confusing with their naming convention. Here, "Text-to-World" simply derives "Text-to-Video" flow and "Video-to-World" is "Image/Video-to-Video flow. To get the raw model you can get from their github repository.

4. Download text encoders (oldt5_xxl_fp8_e4m3fn_scaled.safetensors) from Hugging Face and save it into your "ComfyUI/models/text_encoders" folder.

5. Download VAE(cosmos_cv8x8x8_1.0.safetensors) model from Hugging Face and place it inside your "ComfyUI/models/vae" folder.

6. Restart your ComfyUI to take effect.

Workflow

1. Get the workflow from our Hugging face repository page.

(a) Text-to-Video workflow

(b) Image-to-video-workflow

2. Drag and drop into your ComfyUI.

3. Load Nvidia cosmos(text-to-video or Image-to-Video) model into Unet/Diffusion model node.

4. Load the text encoder model into clip model node.

5. Load VAE model into Load Vae node.

6. Add Positive and Negative prompts. Your prompts should be long and descriptive enough that the cosmos model can understand it. Shorter prompts will give you bad results.

7. Set KSampler settings.

8. Set your video dimension as default to 704 x 704 pixels (minimum limit).

9. Hit "Run" button to generate.

We are running the model on RTX 3090 with 16BG VRAM. The video generation having 704x704 resolution took around 7-8 minutes.

To generate good results, you need to do multiple tries and pick the best one. This model is not so perfect as they claims in their official page. When comparing with other video generation like LTX-Video or HunyuanVideo, the resulted quality will not be up to mark.

NVIDIA's Cosmos Video Generation is really Perfect ?

Installation

Workflow

Posted by Admin

Search This Blog

Recommended

Wan 2.1: Install & Generate Videos locally with lower VRAM

Installing Stable Diffusion 3.5 Locally

Run Stable Diffusion 10x faster on AMD GPUs

Train your WAN2.1 Lora model on Windows/Linux

Easy Install Comfy UI on PC (Windows/Mac/Linux)

HiDream: The Unfiltered Image Generation you need

Our Social Pages

Recent Posts

Important pages

Contact form

NVIDIA's Cosmos Video Generation is really Perfect ?

Installation

Workflow

Posted by Admin

Related Posts

Search This Blog

Recommended

Wan 2.1: Install & Generate Videos locally with lower VRAM

Installing Stable Diffusion 3.5 Locally

Run Stable Diffusion 10x faster on AMD GPUs

Train your WAN2.1 Lora model on Windows/Linux

Easy Install Comfy UI on PC (Windows/Mac/Linux)

HiDream: The Unfiltered Image Generation you need

Our Social Community

Our Social Pages

Recent Posts

Important pages

Contact form