Understanding Generative Adversarial Networks (GANs)
Imagine a counterfeiter trying to create fake money that is indistinguishable from real currency. The police, in turn, are trying to identify the fakes. This catandmouse game is the core concept behind Generative Adversarial Networks, or GANs. GANs are a powerful machine learning technique used to generate highquality synthetic examples, from images to speech.
The Adversarial Learning Framework
At the heart of a GAN is a unique adversarial learning framework. It pits two models against each other:
- The Generator: This is the "counterfeiter." It tries to create realistic samples that can fool the discriminator.
- The Discriminator: This is the "police." It learns to distinguish between real and generated (fake) samples.
Through this competitive process, the generator progressively gets better at creating realistic samples, ultimately building a working generative model.
Generative Modeling: The Big Picture
So, what exactly is generative modeling? Let's say we have input data (X), such as an image. Our goal is to train a model that captures the underlying probability distribution of this data, allowing it to generate new, similar data.
For example, if X represents images, a successful generative model could generate novel images of cats, just by sampling from this learned distribution. It could also be used for anomaly detection, identifying images that don't belong to the learned category.
The Challenge: Intractable Normalization Constants
Ideally, we'd like our generative model (Pθ) to closely approximate the true data distribution (P*). To enable sufficient flexibility in our generative model, we'd usually represent it using some kind of deep neural network.
One approach might seem straightforward: simply use a neural network and treat its output as the probability of the input X. However, there are significant challenges. For a neural network output to represent a true probability distribution, it needs to be:
- Nonnegative
- Normalized (the integral over the entire space must equal one)
The normalization condition is particularly tricky. Computing the normalization constant (Z) often involves an intractable integral. If we can't compute Z, then directly optimizing the numerator of our output can actually *decrease* the probability we are after. This issue of intractable normalization constants is a core problem in generative modeling.
How GANs Overcome the Normalization Problem
GANs sidestep the intractable normalization constant problem by reframing it. Instead of directly learning the data distribution (P*), they learn it *indirectly* with the help of a latent variable (Z), which follows a known probability distribution (Pz), typically a Gaussian or normal distribution. We call Pz our noise distribution.
The idea is to map points from this latent space to points in data space, ensuring that the noise distribution in latent space maps onto the target distribution in data space. The challenge is ensuring that this mapping produces samples from P*. That's where the adversarial process comes in.
GAN Architecture and Training
As mentioned earlier, GANs consist of two models, both built as deep neural networks:
- Generator (G): Maps the noise distribution (Pz) in latent space to a generated distribution (PG) in data space. We want PG to align as closely as possible with the target distribution (P*).
- Discriminator (D): Distinguishes between real samples from P* and generated samples from PG.
The discriminator takes input data (X) and outputs a scalar (D(X)), representing the probability that X is real rather than generated. This output is typically limited to a range between 0 and 1 and is treated as a standard binary classification problem.
The training procedure is conceptually simple: the discriminator performs a binary classification task, while the generator tries to create samples that fool the discriminator. This competitive dynamic is what drives the generator to eventually create realisticlooking samples.
The GAN Loss Function: Binary Cross Entropy
The discriminator's output is the probability of a sample being real. We can define a conditional Bernoulli distribution over class labels (real or generated). The goal of the discriminator is to align its predicted distribution with the ground truth distribution.
The KullbackLeibler (KL) Divergence is used to measure the closeness between the distributions, which is calculated by an expected difference of logs over p b. This essentially becomes the cross entropy between the predicted and ground truth distributions, and reduces to the classic form of the binary cross entropy loss.
The final discriminator loss, L(D), is the sum of the expected values of the binary cross entropy term over both the real and generated distributions.
The generator's loss is the *negative* of the discriminator's loss (excluding a constant term related to the real data). This means the generator and discriminator are being trained on directly contradictory objectives. In game theory terms, it's a twoplayer zerosum game, or a minimax game. The generator tries to minimize a value function (V), while the discriminator tries to maximize it.
The Minimax Game and Nash Equilibrium
This simple gametheoretic framework is, theoretically, sufficient to push the generator distribution (PG) towards the target distribution (P*), provided the models are sufficiently powerful and given enough training time. We're aiming to find a Nash equilibrium where neither the generator nor the discriminator can decrease its loss by changing its parameters, assuming the other model keeps its parameters constant.
To achieve this equilibrium, we:
- Find an expression for the optimal discriminator (D*) given a particular generator (G).
- Find the optimal generator assuming the discriminator remains optimal throughout the entire process.
By finding the optimal discriminator and then substituting it into the minimax objective, we can show that optimizing the generator is equivalent to minimizing the JensenShannon Divergence between P* and PG. This Divergence is nonnegative and reaches its minimum value (zero) only when P* and PG are equal.
Thus, in theory, training a GAN will result in PG aligning with P*.
RealWorld Training Challenges
While the theory is elegant, realworld GAN training can be challenging. Both the generator and discriminator are real neural networks with limited expressibility, and optimization in parameter space can lead to local minima and other issues that prevent the attainment of a global optimum. Some common problems include:
- The Optimal Discriminator Problem: As the discriminator becomes too good, the gradients for the generator parameters can vanish, preventing the generator from learning.
- The Lousy Discriminator Problem: If the generator updates too quickly, the discriminator can't keep up and becomes too confused to provide useful information to the generator.
- NonConvergence: The training process may oscillate without ever converging to a stable equilibrium.
- Mode Collapse: The generator learns to produce only a limited set of realistic outputs, failing to capture the full diversity of the target distribution.
Addressing the Challenges
Several techniques have been developed to address these challenges. These include:
- Alternating between discriminator and generator training steps.
- Setting different learning rates for the discriminator and generator.
- Minibatch discrimination, which provides the discriminator with information about the entire batch of samples.
- Alternative GAN frameworks like Wasserstein GAN (WGAN).
Conclusion
Generative Adversarial Networks offer a powerful and elegant framework for generative modeling. However, as we've seen, training GANs can be tricky. They represent a powerful tool but require careful consideration and often, sophisticated techniques to achieve successful results.