P

Diffusion: Definition and Examples

Family of generative models that create data (images, audio, video) by learning to reverse a progressive noising process, transforming random noise into coherent content step by step.

Full definition

Diffusion is an artificial intelligence generation paradigm inspired by thermodynamics. The principle is based on two phases: a forward process where Gaussian noise is progressively added to a real data point until it becomes unrecognizable, then a reverse process where a neural network learns to remove this noise step by step to reconstruct coherent data from pure noise. <br><br>Concretely, during training, the model observes thousands of images at various noise levels and learns to predict the noise added at each step. Once trained, it can start from a random noise image and gradually "denoise" it to generate an entirely new image. This iterative process, typically involving between 20 and 1000 steps, is what gives diffusion models their remarkable quality. <br><br>Diffusion models are at the core of the most popular image generation tools like Stable Diffusion, DALL·E, and Midjourney. Their strength lies in training stability (unlike GANs), their ability to produce highly detailed images, and their natural compatibility with text conditioning. This is called text-to-image: a text prompt guides the denoising process to steer generation toward the desired result. <br><br>Beyond images, the diffusion principle now extends to audio generation (music, voice), video, 3D models, and even molecular design. Variants like latent diffusion operate in a compressed space to reduce computational cost, making these models accessible on consumer hardware.

Etymology

The term "diffusion" is borrowed from physics and thermodynamics, where it refers to the spontaneous movement of particles from an area of high concentration to an area of low concentration. In AI, the analogy is about the diffusion of noise into data: like particles dispersing, structured information gradually "dissolves" into noise, and the model learns to reverse this dispersion.

Concrete examples

Image generation from a text prompt (text-to-image)

A serene Japanese garden at sunset, with cherry blossoms falling over a koi pond, soft golden light, photorealistic, 8K, detailed

Editing an existing image by inpainting (replacing an area)

Replace the background of this portrait photo with a snowy mountain landscape, preserving the natural lighting and shadows of the subject

Image generation guided by a reference image (image-to-image)

Transform this pencil sketch into a digital watercolor illustration, preserving the original composition and proportions

Practical usage

In prompt engineering for diffusion models, the precision and structure of the prompt directly influence the quality of the result. Use detailed descriptions combining subject, style, lighting, composition, and level of detail, then adjust parameters like the number of denoising steps and guidance scale (CFG) to control fidelity to the prompt. Negative prompts allow you to exclude unwanted elements and significantly refine the generation.

Related concepts

Generative modelGAN (Generative Adversarial Network)VAE (Variational Autoencoder)Text-to-ImageGaussian noiseClassifier-Free Guidance

FAQ

What is the difference between a diffusion model and a GAN?
GANs use two competing networks (generator and discriminator), which can cause training instabilities. Diffusion models, on the other hand, learn a progressive denoising process, making them more stable to train and capable of producing a greater diversity of results. In return, diffusion generation is slower because it requires multiple iterative steps.
Why do diffusion models need so many steps to generate an image?
Each step removes a small amount of noise, allowing the model to make fine, progressive decisions about the structure, details, and textures of the image. Reducing the number of steps (via optimized schedulers like DDIM or DPM++) is possible but at the cost of a slight loss in quality. Recent advances like consistency distillation allow generation in a single step.
What is latent diffusion and why is it important?
Latent diffusion performs the noising and denoising process in a compressed space (latent space) rather than directly on pixels. This significantly reduces the memory and computation time required, making high-resolution image generation possible on consumer GPUs. This is the principle used by Stable Diffusion.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.