SkyReels generating Human Centric Videos

install skyreels opensource model locally using comfyui

Another diffusion-based video generation model has entered the open-source market: Skyreels, a human-centric video framework fine-tuned on HunyuanVideo. It offers open-source leadership, advanced facial animation, and cinematic lighting and aesthetics.

However, the model requires a significant amount of VRAM at least 79GB making it difficult to run on standard hardware. The best solution is to use a quantized version of the model. In this guide, we will show you a better alternative to run Skyreels without encountering out-of-memory errors.


Installing process

update ComfyUI from the manager

1. Get install ComfyUI if you have not yet. Older user have to update it from the manager section by selecting "Update ComfyUI" option.

2. Clone the repository provided by Kijai for HunyuanVideo if you have not yet. The user having this custom node already installed need to just update it from the ComfyUI manager section.

download skyreels quantized model

3. Download any of the quantized model Q3(lower weight with lower quality generation) to Q8(heavy weights with higher generated quality) form Hugging Face repository.

Choose the one which suits your system requirements. Save it inside "ComfyUI/models/diffusion_models folder. Also download the same VAE and text encoders mentioned for Kijai's HunyuanVideo. If you already done then its not required.

4.Restart ComfyUI to take effect.


Workflow

1. As the Skyreel fined tuned on HunyuanVideo, the workflow will be same as that of the HunyuanVideo. So, simply use the Kijai's workflow for HunyuanVideo that we have explained in our tutorial.

2. Now, to input the image use the "InstructPixtoPixConditioning" node or something similar that adds an encoded image.

3. Upload the downloaded quantized model from the Load Diffusion model node. Rest of all will be as same as HunyuanVideo. 

4. You can put your positive prompt to add the detailing for your generation. 

4. Add your prompt and click "Queue" to start your generation.

We have inputted a girl's image who is wearing a white colored tank top. Now, our goal is to create a human video that showcase the clothing fabric quality for ecommerce perspective.

Prompt used: A high-resolution studio photograph of a female model wearing a sleeveless tank top, standing against a clean white background. The fabric appears soft, lightweight, and breathable, with a modern, form-fitting design. The model has natural-looking makeup and neat hair, giving off a casual yet stylish vibe. The lighting is bright and even, with soft shadows for depth. The model poses confidently with relaxed shoulders, smiling subtly, showcasing the top from multiple angles—front, back, and side. Close-up shots capture fabric texture, stitching, and branding details, ensuring a premium look. The overall composition is minimalistic yet engaging, ideal for eCommerce product listings.

CFG:6

Embedded Guided scale: 1

Steps: 20

skyreels video generation result

Here, is the generated result. Well, not so bad but you will observe their is some kind of blurriness on her eyes with some lower frame quality. The video has been generated in 720p format. The officially recommended resolution is 960(height) by 544 (width) with CFG value as 6 and embedded guided scale to 1. Longer video length will take subsequently much time as compared to shorter video generation.

You need to turn one the block swap if you are using 12GB or lower VRAM. For higher its not required.

The model is in the initial stage so you cannot expect much from this but great initiative for fune-tuning. Sometimes model generates a kind of artifacts and lose frames.