Lumina Image 2.0 - Amazing Text to Image generation

install lumina image 2.0 model locally

Lumina Image 2.0 is a powerful text-to-image generation model with 2 billion parameters that leverages features compared with the earlier one. With its flow-based diffusion transformer and Gemma (from Google) as the text encoder, it generates high-quality images from text prompts. 

The model, available under the Apache 2.0 license, that means you are allowed to use for both research and commercial purpose.

In this guide, we will go through setting up Lumina Image 2.0 in ComfyUI, ensuring a seamless installation process.

Table of contents:


Installation

update ComfyUI from the manager

If you are a new user, install ComfyUI. Existing user, need to update ComfyUI from the Manager to ensure compatibility with the latest model.

TYPE A : Native Support

Download Lumina image2.0 model

1. Now, download Lumina Image 2.0 model (lumina_2.safetensors) from Hugging Face repository and save it into your "ComfyUI/models/checkpoints" folder. The model size is some what lesser than that of older diffusion models.

2. Restart ComfyUI to apply the changes and load the new model.


TYPE B : GGUF Variant

1. Download the GGUF quantized variant model from Hugging face repository (Q2- for faster generation to Q8 for high precision ). Choose any one of them and put it into your "ComfyUI/models/diffusion_models" folder.

2.  Now, you also need to download google Gemma (gemma_2_2b_fp16.safetensors) from Hugging face and save it into your "ComfyUI/models/text_encoders" folder.

3. Download Vae(variational auto encoder) model from Hugging face. there are different options you can opt. (fp32 or fp16 or fp8). Save this into your "ComfyUI/models/vae" folder.

4. Restart ComfyUI to take effect.


Workflow

1. Get the basic workflow or GGUF workflow from our Hugging Face repository.

2. Drag and drop into ComfyUI.

3. Follow the instructions as follows: 

load lumina image model

(a) Load Lumina Image 2.0 model into checkpoint node.

add positive/negative prompts

(b) Add positive/negative prompts. Assign role into the positive prompt so that model can understand better. After "<Prompt Start>" add your positive prompt to generate your art style.

Set Ksampler settings

(c) Setup the KSampler configuration.

Choose the resolution

(d) Choose your image resolution from SD3 LatentImage node. Default is 1024 by 1024. Then , at last hit "Queue" button to start generation. 

We did some tastings with different scenarios. 

lumina generation testing with lanscapes


Prompt used: Majestic landscape photograph of snow-capped mountains under a dramatic sky at sunset. The mountains dominate the lower half of the image, with rugged peaks and deep crevasses visible. A glacier flows down the right side, partially illuminated by the warm light. The sky is filled with fiery orange and golden clouds, contrasting with the cool tones of the snow. The central peak is partially obscured by clouds, adding a sense of mystery. The foreground features dark, shadowed forested areas, enhancing the depth. High contrast, natural lighting, warm color palette, photorealistic, expansive, awe-inspiring, serene, visually balanced, dynamic composition.

Steps: 35

Sampling: Euler

CFG:4

height and width:1024

For this, we are running the model on RTX 4080 and the generation time was around 26 seconds with 35 steps(basic workflow).


lumina generation testing with human faces

Prompt used: Close-up portrait of a young woman with light brown hair, looking to the right, illuminated by warm, golden sunlight. Her hair is gently tousled, catching the light and creating a halo effect around her head. She wears a white garment with a V-neck, visible in the lower left of the frame. The background is dark and out of focus, enhancing the contrast between her illuminated face and the shadows. Soft, ethereal lighting, high contrast, warm color palette, shallow depth of field, natural backlighting, serene and contemplative mood, cinematic quality, intimate and visually striking composition.

Steps: 40

Sampling: Euler

CFG:4

height and width:1024

Well, there are plenty of the diffusion model out there, but if we compare its somewhat generating similar results as SDXL but not that much better than Flux. Although, the prompt adherence, comprehension is quite impressive.