AI & MACHINE LEARNING

Generative adversarial networks (GANs)

Generative models in which two networks compete: a generator produces samples from noise and a discriminator tries to separate real from generated. Training seeks a minimax equilibrium. They generate in a single step but suffer instability and mode collapse.

Extended definition

Generative adversarial networks (GANs) are a class of generative models in which two neural networks compete in a near-zero-sum game. The generator produces samples from noise, trying to imitate the distribution of real data; the discriminator receives real and generated samples and tries to tell them apart. Training is the search for an equilibrium: the generator improves by fooling the discriminator, and the discriminator improves by not being fooled. Goodfellow and colleagues (2014, reissued 2020) formalized this minimax scheme and showed that, at the optimum, the generator reproduces the data distribution and the discriminator can do no better than chance. Gui and colleagues (2023), in the reference review, organize the family into variants of architecture, objective function, and application, and treat training stability as the central problem that motivated most of the proposed variants.

When it applies

GANs apply when fast, sharp single-step generation is wanted: unlike diffusion, the generator produces the sample in a single pass, which keeps them competitive where latency matters. Creswell and colleagues (2018) document use in image synthesis, image-to-image translation, super-resolution, and editing. They apply well to data augmentation, generating plausible synthetic examples to train other models when real data is scarce or sensitive. They apply to style transfer and to problems where a pair of domains must be aligned without direct supervision. In research, they remain useful as a generative baseline and in edge scenarios, where diffusion’s inference cost is prohibitive.

When it does not apply

GANs do not apply well when training stability is critical and tuning resources are limited. Adversarial training is notoriously unstable and prone to mode collapse, where the generator starts producing a few nearly identical samples, ignoring the diversity of the data; Gui and colleagues (2023) treat this problem as recurrent across the whole family. They do not apply when an explicit likelihood is required: a GAN does not estimate the data density, which prevents directly computing the probability of a sample. They do not apply as the best choice for the image-quality frontier, a position now held by diffusion models, which offer superior mode coverage. And they do not apply without careful evaluation: the sharpness of a sample does not guarantee that the generator covered the distribution.

Applications by field

  • Computer vision: image synthesis and editing, super-resolution, and cross-domain translation, with single-step generation.
  • Medical imaging: synthetic image generation for data augmentation and anonymization, with the caveat of validating clinical realism and the absence of memorization.
  • Privacy and sensitive data: production of synthetic data that preserves statistical patterns without exposing real records.
  • Art and design: style transfer and conditional generation, where interactive control and speed favor their use.

Common pitfalls

The first pitfall is underestimating training instability: without careful tuning of architecture, objective function, and learning rate, the GAN diverges or collapses. The second is conflating sharpness with coverage: a GAN can produce beautiful images and still ignore entire regions of the real distribution, a failure that mode collapse makes invisible to the eye. The third is evaluating only by visual inspection, without diversity metrics, letting collapse go unnoticed. The fourth is treating GAN synthetic data as real without auditing training bias and memorization, a serious risk in sensitive domains. The fifth is choosing a GAN by inertia when the task calls for frontier fidelity, a case where diffusion is usually the technically superior option.

Last updated —