While generating image using Stable diffusion you have ever faced with bad figures and artifacts. When it comes to control your images generation, negative prompts are essential factor in AI image generation, specially while playing with Stable Diffusion models.
Effective using of negative prompts can refine the output by specifying unwanted elements. Let's move to more in-depth understanding.
Table of Contents:
What Are Negative Prompts?
In simple words, Negative prompt is a set of terms or keywords that directs the diffusion models on what not to include in an image. When generating an image, you input a positive prompt (what you want the model to depict) and can also add a negative prompt to exclude specific features.
By using negative prompts, you can guide the models to avoid certain outcomes like blurry images, logos, or other unwanted elements. You can get more ideas from our tutorial on best negative prompts.
But there is more to it than just adding the opposite of what you want. Using negative prompts effectively requires understanding how the model interprets with each ignorance.
How Negative Prompts Work?
Whenever you try to put both positive and negative prompts, the model essentially generates two versions of the image. The positive image reflects your desired elements, while the negative image includes what the model associates with the negative prompts.
Here, the CFG (Classifier-Free Guidance) Value plays a crucial role. Think of the CFG as the controller that determines how much the model should deviate from the negative image to shape the positive one.
We have already shared multiple times in our other tutorials, CFG value influence your output. This means a high CFG value will push the output further away from the negative prompt's influence, makes it closer to what you asked for in the positive prompt. Whereas, setting the CFG to zero shows you what the negative prompt alone produce.
To explain further, we take it as an example. If your negative prompt includes "bad limbs" the AI model sometimes interpret this as generating limbs that doesn't fit your ideal "good limbs" concept (which is what people usually do). But because "bad limbs" and "good limbs" both focus on limbs, the model may still struggle to achieve a flawless limbs, as the two concepts directly contradicts each other.
Instead, choosing terms that avoid conflict (like "monochrome" to achieve a vibrant image) often works better because the model doesn't encounter overlapping concepts.
The direct answer is - Use the opposite of what you want. For example, if you are desiring for a vibrant, colorful scene, using "monochrome" in the negative prompt can help, as it directly opposes the colorfulness you are after.
Some of the testing we did with and without negative prompts. Here, we are using Stable Diffusion XL model to generate images.
Positive prompt: a girl with a neon rainbow colored hair holding a black cat, realistic anime, uhd image, cartoonish characters, violet and azure, nightcore, depth of layers, cartoon-like characters, rainbow colors, hyper realism , 8k, octane render
These are the results generated only with positive prompts and no negative prompt. You can see how much deformity is there in fingers with little artifacts in the images.
Negative prompt: artifacts, bad anatomy, deformed fingers
Now, the problem is many people do not have these specific domain keyword knowledge. So, the simple key is to do Google/Bing search or use LLMs(Large Language Models) like ChatGPT, Perplexity, Google Gemini etc.
The Drawbacks of Negative Prompting
Negative prompts can be incredibly powerful but also come with potential downsides, particularly if overused or applied in the wrong manner. Some of the major point we have listed below:
1. Overuse of Embeddings- Like we are using keyword-"bad anatomy" in the above generated art. So, embedding keywords like "bad anatomy" or "bad artist" seem helpful in using, but they can also block desirable traits in the final image.
Usually, embedded prompts with general or abstract terms can unintentionally obstruct specific elements you want, such as unique character traits (like God creature with single eye) or specific artistic styles.
Overuse can also make it impossible to achieve certain effects, like the rule of thirds in composition or specific facial features. Even if you want a realistic look, including a dozen negatives like "blurry," "distorted," and "ugly" results the AI model to focus so much on what not to do that it loses sight of what you actually want.
2. Undefined Terms- Words like "ugly" or "bad" may not produce the intended results because they're abstract. Stable Diffusion models doesn't know what individuals find "ugly" or "bad" by default. User experiences highlight that terms like "ugly eyes" often do little to improve features and can result in inconsistent outcomes. The model doesnot know what makes an eye ugly without specific context, which is why using concrete terms is more effective.
3. Unhelpful Terms- Sometimes, popular negative terms are not proved to helpful at all. For example, "malformed hands" might sound like it would help with generating accurate hands, but the model may not effectively differentiate between malformed and regular hands. Terms like "logo," "watermark," and "blurry," however, are usually safer side, as they are more relatable to visual patterns in Stable Diffusion's training dataset.