Where the GAN bet everything on a single generator pass and paid for it with unstable training and dropped modes, diffusion makes a different trade. It keeps the sample quality a GAN gets, regains stability (no adversary — just a regression) and mode coverage (it never stops covering the whole distribution), and pays the bill in one currency: sampling steps. One forward pass becomes tens to thousands.
The forward process — controlled destruction. Take a real datapoint x₀ and add a little Gaussian noise. Then a little more. After enough steps the signal is gone and you're left with pure static — a sample from a plain Gaussian. This direction has no learnable parameters; it's fixed math you completely control. That's the secret to the whole thing: you only ever learn to invert a process you already understand perfectly.
The reverse process — what the network learns. Given a noisy datapoint and its noise level, predict the noise that was mixed in, and subtract a small slice of it to get a slightly cleaner datapoint. Run that loop from pure noise and the chain of small denoising steps walks you from the Gaussian back to the data manifold. The model never has to make the impossible jump in one go — only the easy local correction.
Why iterative beats one-shot. A GAN's generator must map a random vector straight onto a photorealistic image; that's a brutal function to learn, which is why GANs are finicky. Diffusion breaks the impossible jump into a thousand questions the model can answer — "this is slightly too noisy, clean it up a touch". The objective is plain mean-squared error on the noise: no minimax, no critic, no equilibrium to balance.
The catch. That chain is the price. Generating a sample means running the network once per step — 10s to 1000s of forward passes versus a GAN's single one. Most of the field's recent work (DDIM, distillation, consistency models, flow matching) is about making the chain shorter without losing quality. For the deep architectural and scheduler detail, see Diffusion Models, in depth.