The renowned GPU manufacture entered into the diffusion race. SANA released by NVIDIA Labs can generate a 1024 × 1024 image in under 1 second on a 16GB laptop GPU, handles resolutions up to 4096 × 4096. It competes with much larger models like Flux-12B while being 20× smaller and 100× faster.
Sana 1.6B & 0.6B : A text-to-image model 🥰
— Stable Diffusion Tutorials (@SD_Tutorial) October 16, 2024
-generate images up to 4096 × 4096 resolution.
-synthesize high-resolution, high-quality images with strong text-image alignment
-remarkably fast speed, deployable on laptop GPU.
Follow page:https://t.co/Ua445dvHgE pic.twitter.com/atn1hqRHtR
Unlike traditional Stable diffusion models (ex- SDXL), it uses an AE-F32C32 auto-encoder which reduces image data by 32× (compared to 8× in older methods), making the process faster without losing quality.
It uses linear attention, perfect for handling high-resolution images efficiently. Replaces older text encoders (like T5) with a smaller, faster model that understands instructions better. Its Flow-DPM-Solver speeds up the process, creating high-quality images in fewer steps. For more in-depth understanding, you can access their research paper.
Installation
1. Get ComfyUI installed into your machine.
2. Navigate to ComfyUI's Manager, select "Custom Nodes Manager" and search for "Extra models" by author "city96" then click "Install". To get the real time download status, you can check the ComfyUI terminal.
Alternative(Manual):
Move inside "ComfyUI/custom_nodes" folder. Open command prompt by typing "cmd" on the folder address bar. Then clone the repository using the following command:
git clone https://github.com/Efficient-Large-Model/ComfyUI_ExtraModels.git
3. Download the models from Sana's Hugging face repository and save it into your "ComfyUI/models/checkpoints" folder.
4. Next is to download VAE(Variaional Auto Encoder) from Hugging Face repository and save it into "ComfyUI/models/vae" folder. After downloading rename it to something relatable (like- sana_vae.safetensors ) to avoid conflicts.
5. Another model also required that will refine your prompts. Its a light weight LLM- Google's Gemma (well know for developers usage) gets automatically downloaded when you run the workflow. To get the real time status just switch to ComfyUI's terminal running at the background.
While running the workflow, many people are getting black/grey image output. If you are getting the same, just uninstall the "Extra Models" custom nodes. Restart the ComfyUI. Choose the manual installation in Step2. Then, navigate to Install Missing Custom Nodes from the manager and at last Click Install gemma.
6. Restart and refresh ComfyUI.
Workflow
1. Get the workflow from Github repository.
(a) Basic TextToImage workflow
2. Drag and drop to ComfyUI.
(a) Load your SANA model checkpoint.
(b) Load VAE model.
(c) Select Gemma model. It can also run in CPU if you do not have higher GPUs but will take more time. To do this, you can use 4bit quantized model when working with CPU setup.
(d) Set KSampler settings
(e) Add positive and negative prompt.
(f) Hit "Queue" button to start generating, and here is our result.
First try:
Prompt: a cyberpunk cat with a neon sign that says "Sana"
CFG: 3
Steps:18
The results are not cherry picked. Second generation is much better than that of the first one. The text is not up to mark.
Steps:18