Skip to content

5: Generative Adversarial Networks (GANs)

Deep Convolutional GAN Output GIF Conditional Deep Convolutional GAN Output GIF
Left: Deep Convolutional GAN | Right: Conditional Deep Convolutional GAN

📋 Overview

This assignment focuses on implementing and training Generative Adversarial Networks (GANs) for image generation. The project implements two variants: DCGAN (Deep Convolutional GAN) for unconditional generation and CDCGAN (Conditional Deep Convolutional GAN) for class-conditional generation. Both models are trained on the AFHQ (Animal Faces-HQ) dataset to generate realistic animal face images.

🎯 Objectives

  • Implement a fully convolutional Generator network using transposed convolutions
  • Implement a fully convolutional Discriminator network for binary classification
  • Train DCGAN for unconditional image generation
  • Train CDCGAN for conditional image generation with class labels
  • Understand adversarial training dynamics and GAN loss functions
  • Visualize generated samples and monitor training progress

📊 Dataset

AFHQ (Animal Faces-HQ) - High-quality animal face dataset - Classes: 3 (cats, dogs, wild animals) - Image size: 64×64×3 (RGB) - Training/Test split: Standard train/test split - Location: ../Assignment4/data/AFHQ/

The dataset is organized using PyTorch's ImageFolder structure:

AFHQ/
├── train/
│   ├── cat/
│   ├── dog/
│   └── wild/
└── test/
    ├── cat/
    ├── dog/
    └── wild/

Data Preprocessing: - Resize to 64×64 pixels - Normalize to [-1, 1] range using transforms.Normalize([0.5]*3, [0.5]*3)

🏗️ Models Implemented

1. Generator Network

A fully convolutional generator that maps random noise to realistic images:

Architecture: - Input: Random noise vector (B, latent_dim, 1, 1) where latent_dim=128 - Conditional mode: Concatenates class embeddings with noise vector - Layers: 6 transposed convolutional blocks - 5 blocks with BatchNorm + ReLU activation - Final block with Tanh activation (outputs in [-1, 1] range) - Output: Generated images (B, 3, 64, 64)

Channel progression: latent_dim → 512 → 256 → 128 → 64 → 32 → 3

Features: - Supports both conditional and unconditional modes - Uses ConvTranspose2d for upsampling - Class embeddings via nn.Embedding in conditional mode

2. Discriminator Network

A fully convolutional discriminator that classifies real vs. fake images:

Architecture: - Input: Images (B, 3, 64, 64) or (B, 4, 64, 64) in conditional mode - Conditional mode: Concatenates class embeddings as additional channel - Layers: 6 convolutional blocks - 5 blocks with BatchNorm + LeakyReLU (slope=0.2) + Dropout (p=0.3) - Final block: Conv2d + Sigmoid (outputs probability) - Output: Real/fake probability (B, 1, 1, 1)

Channel progression: 3 → 64 → 128 → 256 → 256 → 512 → 1

Features: - Progressive downsampling using stride-2 convolutions - Gradient clipping (max norm=3.0) for training stability - Binary cross-entropy loss for real/fake classification

3. Trainer Class

A unified training infrastructure for both GAN variants:

Key Components: - Adversarial training loop with alternating generator/discriminator updates - Binary cross-entropy loss functions - Adam optimizers with learning rate 3e-4, betas=(0.5, 0.9) - TensorBoard logging for losses and generated images - Automatic image generation and visualization every 200 iterations

Training Strategy: 1. Discriminator update: Train on both real and fake images 2. Generator update: Train to fool the discriminator 3. Loss balancing: Monitor both losses to ensure stable training

🔬 Experiments

DCGAN (Unconditional Generation)

Configuration: - Model: Generator + Discriminator (unconditional) - Latent dimension: 128 - Batch size: 64 - Learning rate: 1e-3 - Epochs: 15-50 (configurable) - Optimizer: Adam (lr=3e-4, betas=(0.5, 0.9))

Usage:

python DCGAN.py

CDCGAN (Conditional Generation)

Configuration: - Model: Generator + Discriminator (conditional) - Latent dimension: 128 - Number of classes: 3 - Batch size: 64 - Learning rate: 1e-3 - Epochs: 15-50 (configurable) - Conditioning: Class labels embedded and concatenated

Usage:

python CDCGAN.py

🛠️ Key Features

Loss Functions

Discriminator Loss: - Real images: BCE(pred_real, 1) - maximize probability of real images - Fake images: BCE(pred_fake, 0) - minimize probability of fake images - Total: D_loss = D_loss_real + D_loss_fake

Generator Loss: - BCE(pred_fake, 1) - maximize probability that discriminator thinks fakes are real

Training Infrastructure

  • TensorBoard logging:
  • Generator loss
  • Discriminator loss (real + fake)
  • Combined loss curves
  • Generated image grids (every 200 iterations)

  • Model checkpointing: Saves generator and discriminator state dicts

  • Progress tracking: Real-time training progress with tqdm
  • Image generation: Automatic sample generation during training

Architecture Details

Generator: - Uses ConvTransposeBlock with BatchNorm and ReLU - Final layer uses Tanh to output in [-1, 1] range - Conditional mode: Label embeddings concatenated with noise

Discriminator: - Uses ConvBlock with BatchNorm and LeakyReLU - Dropout (p=0.3) for regularization - Conditional mode: Label embeddings added as spatial channel - Gradient clipping for stability

📁 Project Structure

Assignment5/
├── DCGAN.py              # DCGAN training script
├── CDCGAN.py             # CDCGAN training script
├── models.py             # Generator, Discriminator, and Trainer classes
├── utils.py              # Utility functions (model saving, logging, etc.)
├── task1.ipynb           # Task notebook
├── Session5.ipynb        # Lab session materials
├── configs/
│   └── DCGAN1/
│       └── config.yaml   # Configuration file
├── models/               # Saved model checkpoints
├── tboard_logs/          # TensorBoard log files
│   └── gan/
├── resources/            # Educational resources
│   ├── conv.gif
│   ├── deconv.gif
│   ├── deconv.png
│   └── pixel_shuffle.pbm
├── DCGAN.gif            # DCGAN generation animation
└── CDCGAN.gif           # CDCGAN generation animation

📈 Training Process

Training Loop

  1. Sample random noise from latent space
  2. Generate fake images using generator
  3. Train discriminator:
  4. Forward pass on real images → compute real loss
  5. Forward pass on fake images → compute fake loss
  6. Backpropagate and update discriminator
  7. Train generator:
  8. Forward pass on fake images through discriminator
  9. Compute generator loss (trying to fool discriminator)
  10. Backpropagate and update generator
  11. Log metrics and generate sample images periodically

Monitoring Training

Key Metrics: - Generator Loss: Should decrease as generator improves - Discriminator Loss: Should stabilize (not too low, not too high) - Loss Balance: Both losses should be in similar ranges for stable training

Warning Signs: - Discriminator loss → 0: Discriminator too strong, generator can't learn - Generator loss → 0: Generator may be collapsing or mode dropping - Oscillating losses: Training instability, may need to adjust learning rates

🚀 Usage

Prerequisites

  1. Install dependencies:

    pip install torch torchvision numpy matplotlib tqdm pyyaml tensorboard pytorch-lightning scikit-learn
    

  2. Download AFHQ dataset (if not already available):

    cd ../Assignment4
    bash download.sh
    

Running DCGAN

  1. Edit configuration (optional):

    configs = {
        "model_name": "DCGAN",
        "exp": "1",
        "latent_dim": 128,
        "batch_size": 64,
        "num_epochs": 50,
        "lr": 1e-3,
    }
    

  2. Run training:

    python DCGAN.py
    

Running CDCGAN

  1. Edit configuration (optional):

    configs = {
        "model_name": "CDCGAN",
        "exp": "1",
        "latent_dim": 128,
        "batch_size": 64,
        "num_epochs": 50,
        "lr": 1e-3,
    }
    

  2. Run training:

    python CDCGAN.py
    

Viewing TensorBoard Logs

tensorboard --logdir=tboard_logs

Then open http://localhost:6006 in your browser to view: - Training loss curves - Generated image grids over time - Model architecture graphs

Loading Saved Models

from models import Generator, Discriminator, Trainer
import torch

# Load checkpoint
checkpoint = torch.load('models/DCGAN1/checkpoint_DCGAN1_epoch_50.pth')

# Initialize models
generator = Generator(latent_dim=128, num_channels=3, base_channels=64)
discriminator = Discriminator(in_channels=3, out_dim=1, base_channels=64)

# Load weights
generator.load_state_dict(checkpoint['generator_state_dict'])
discriminator.load_state_dict(checkpoint['discriminator_state_dict'])

# Generate images
generator.eval()
with torch.no_grad():
    noise = torch.randn(64, 128, 1, 1)
    fake_images = generator(noise)

🔍 Key Concepts

Transposed Convolutions

The generator uses transposed convolutions (also called deconvolutions) to upsample from low-resolution feature maps to high-resolution images. Each layer doubles the spatial dimensions while reducing channels.

Adversarial Training

GANs use a minimax game where: - Discriminator tries to maximize: log(D(x)) + log(1 - D(G(z))) - Generator tries to minimize: log(1 - D(G(z)))

This creates a competitive dynamic that drives both networks to improve.

Conditional Generation

CDCGAN extends DCGAN by: - Generator: Concatenates class embeddings with noise vector - Discriminator: Receives class information as additional spatial channel - Result: Can generate images of specific classes on demand

📊 Results & Analysis

The project includes: - Animated GIFs: Showing generation progress over training (DCGAN.gif, CDCGAN.gif) - TensorBoard visualizations: Loss curves and generated image grids - Model checkpoints: Saved at regular intervals for evaluation

Expected Outcomes

  • DCGAN: Generates diverse animal faces without class control
  • CDCGAN: Generates animal faces for specific classes (cat, dog, wild)
  • Training stability: Balanced losses indicate successful adversarial training

🔗 References

💡 Tips for Training GANs

  1. Learning Rate: Start with 2e-4 to 5e-4 for both networks
  2. Batch Size: Use batch sizes of 64-128 for stability
  3. Normalization: BatchNorm in generator, LayerNorm can help in discriminator
  4. Loss Monitoring: Watch for mode collapse or discriminator overpowering generator
  5. Architecture: Follow DCGAN guidelines (no fully connected layers, use strided convolutions)
  6. Initialization: Use proper weight initialization (Xavier/He)

💬 Support

If you found this project helpful, you can support my work by buying me a coffee or via paypal!

Buy Me a Coffee

PayPal


Location

The complete assignment documentation, code, and notebooks are located in:

src/Assignment5/


This assignment demonstrates generative modeling using adversarial training, showcasing both unconditional and conditional image generation capabilities.