Lumina Image 2.0 is a powerful text-to-image generation model with 2 billion parameters that leverages features compared with the earlier one. With its flow-based diffusion transformer and Gemma (from Google) as the text encoder, it generates high-quality images from text prompts.
The model, available under the Apache 2.0 license, that means you are allowed to use for both research and commercial purpose.
Lumina-Image 2.0 😍
— Stable Diffusion Tutorials (@SD_Tutorial) January 31, 2025
-An Efficient, Unified and Transparent Image Generative Model (Reg. under Apache2.0)
-Resolution(1024)
-Parameter(2 B)
-Text Encoder (Gemma2B)
Project page :https://t.co/tZDsV95jK0 pic.twitter.com/NW3G1tkMUc
In this guide, we will go through setting up Lumina Image 2.0 in ComfyUI, ensuring a seamless installation process.
Table of contents:
Installation
If you are a new user, install ComfyUI. Existing user, need to update ComfyUI from the Manager to ensure compatibility with the latest model.
TYPE A : Native Support
1. Now, download Lumina Image 2.0 model (lumina_2.safetensors) from Hugging Face repository and save it into your "ComfyUI/models/checkpoints" folder. The model size is some what lesser than that of older diffusion models.
2. Restart ComfyUI to apply the changes and load the new model.
TYPE B : GGUF Variant
1. Download the GGUF quantized variant model from Hugging face repository (Q2- for faster generation to Q8 for high precision ). Choose any one of them and put it into your "ComfyUI/models/diffusion_models" folder.
2. Now, you also need to download google Gemma (gemma_2_2b_fp16.safetensors) from Hugging face and save it into your "ComfyUI/models/text_encoders" folder.
3. Download Vae(variational auto encoder) model from Hugging face. there are different options you can opt. (fp32 or fp16 or fp8). Save this into your "ComfyUI/models/vae" folder.
4. Restart ComfyUI to take effect.
Workflow
1. Get the basic workflow or GGUF workflow from our Hugging Face repository.
2. Drag and drop into ComfyUI.
3. Follow the instructions as follows:
Prompt used: Majestic landscape photograph of snow-capped mountains under a dramatic sky at sunset. The mountains dominate the lower half of the image, with rugged peaks and deep crevasses visible. A glacier flows down the right side, partially illuminated by the warm light. The sky is filled with fiery orange and golden clouds, contrasting with the cool tones of the snow. The central peak is partially obscured by clouds, adding a sense of mystery. The foreground features dark, shadowed forested areas, enhancing the depth. High contrast, natural lighting, warm color palette, photorealistic, expansive, awe-inspiring, serene, visually balanced, dynamic composition.
Steps: 35
Sampling: Euler
CFG:4
height and width:1024
For this, we are running the model on RTX 4080 and the generation time was around 26 seconds with 35 steps(basic workflow).
Prompt used: Close-up portrait of a young woman with light brown hair, looking to the right, illuminated by warm, golden sunlight. Her hair is gently tousled, catching the light and creating a halo effect around her head. She wears a white garment with a V-neck, visible in the lower left of the frame. The background is dark and out of focus, enhancing the contrast between her illuminated face and the shadows. Soft, ethereal lighting, high contrast, warm color palette, shallow depth of field, natural backlighting, serene and contemplative mood, cinematic quality, intimate and visually striking composition.
Steps: 40
Sampling: Euler
CFG:4
height and width:1024
Well, there are plenty of the diffusion model out there, but if we compare its somewhat generating similar results as SDXL but not that much better than Flux. Although, the prompt adherence, comprehension is quite impressive.