4: Variational Autoencoders (VAEs)
๐ Overview
This assignment focuses on implementing and training Variational Autoencoders (VAEs) for image generation and reconstruction. The project implements both standard Convolutional VAEs (CVAE) and Conditional Convolutional VAEs (CCVAE) on the AFHQ (Animal Faces-High Quality) dataset, with experiments exploring the effect of KL divergence weighting on model performance.
๐ฏ Objectives
- Implement Convolutional Variational Autoencoder (CVAE) from scratch
- Implement Conditional Convolutional Variational Autoencoder (CCVAE) with class conditioning
- Understand the reparameterization trick and ELBO (Evidence Lower BOund) optimization
- Experiment with different KL divergence weights (ฮป_KLD) to balance reconstruction and regularization
- Visualize latent space representations and generate novel images
- Analyze the trade-off between reconstruction quality and latent space regularization
๐ Dataset
AFHQ (Animal Faces-High Quality) - High-quality animal face dataset
- Training samples: ~15,000 images
- Test samples: ~1,500 images
- Image size: 64ร64ร3 (RGB)
- Classes: 3 categories (cat, dog, wildlife)
- Download: Automatically downloaded via download.sh script
The dataset is organized in ImageFolder format with train/test splits.
๐๏ธ Models Implemented
1. Convolutional VAE (CVAE)
A standard Variational Autoencoder with convolutional encoder-decoder architecture:
Encoder: - Input: (3, 64, 64) - Conv layers: 3โ16โ32โ64โ128 channels - Output: 2048-dimensional flattened features - Fully connected layers: 2048 โ ฮผ, log(ฯยฒ) (latent_dim dimensions)
Latent Space: - Reparameterization trick: z = ฮผ + ฮตยทฯ, where ฮต ~ N(0,1) - Default latent dimension: 64 (configurable)
Decoder: - Input projection: latent_dim โ 2048 - Reshape: 2048 โ (128, 4, 4) - Deconv layers: 128โ64โ32โ16โ3 channels - Output: (3, 64, 64) with Sigmoid activation
Key Features: - Batch normalization after each conv/deconv layer - LeakyReLU(0.2) activations - Reparameterization trick for differentiable sampling - MSE reconstruction loss + KL divergence regularization
2. Conditional Convolutional VAE (CCVAE)
An extension of CVAE that conditions both encoder and decoder on class labels:
Architecture: - Same encoder-decoder structure as CVAE - Class conditioning: One-hot encoded class labels concatenated with: - Encoder output (before ฮผ, log(ฯยฒ) computation) - Latent vector (before decoder input projection) - Supports controlled generation by specifying class labels
Key Features: - Class-conditional encoding and decoding - Same architecture as CVAE with additional class embeddings - Enables class-specific image generation - Useful for controlled generation tasks
๐ฌ Experiments
The project includes multiple experiments exploring different KL divergence weights:
| Experiment | Model | ฮป_KLD | Latent Dim | Description |
|---|---|---|---|---|
| CVAE1 | CVAE | 0.001 | 64 | Baseline with moderate KL weight |
| CVAE2 | CVAE | 0.0 | 64 | No KL regularization (pure autoencoder) |
| CVAE3 | CVAE | 0.01 | 64 | Higher KL weight (stronger regularization) |
| CVAE4 | CVAE | 0.0001 | 64 | Lower KL weight (weaker regularization) |
| CVAE_new1 | CVAE | 0.0001 | 64 | Variant with different architecture |
| CCVAE1 | CCVAE | 0.0001 | 64 | Conditional VAE with class labels |
Training Configuration
Default training parameters: - Optimizer: AdamW - Learning rate: 0.001 (configurable) - Batch size: 64 - Epochs: 50 - Weight decay: 1e-4 - Scheduler: ReduceLROnPlateau (patience=7, factor=0.5) - Loss function: MSE reconstruction + ฮป_KLD ร KL divergence
Loss Function
The VAE loss combines reconstruction and regularization terms:
Where: - Reconstruction loss: Mean Squared Error between input and reconstructed images - KL divergence: Regularization term encouraging latent distribution to match prior N(0,I) - ฮป_KLD: Weight controlling the trade-off between reconstruction quality and latent space regularization
๐ ๏ธ Key Features
Training Infrastructure
- TensorBoard logging: Training/validation loss, reconstruction loss, KL divergence, learning rate
- Image visualization: Automatic saving of reconstruction comparisons every N epochs
- Model checkpointing: Saves model states with training statistics
- Progress tracking: Real-time training progress with tqdm progress bars
- Config management: YAML-based configuration files for experiment reproducibility
Visualization Tools
- Reconstruction comparison: Side-by-side original vs reconstructed images
- Latent space visualization: PCA projection of latent representations colored by class
- Image generation: Sample from latent space to generate novel images
- Latent space traversal: Visualize how changes in latent dimensions affect generated images
Utility Functions
denormalize_images(): Convert images from [-1, 1] to [0, 1] rangevae_loss_function(): Combined reconstruction and KL divergence losstrain_model(): Complete training loop with validationeval_model(): Model evaluation with image savingvis_latent(): Visualize latent space using PCAinference(): Generate images from random latent vectorssave_model()/load_model(): Model checkpoint management
๐ Project Structure
Assignment4/
โโโ Assignment4.ipynb # Main assignment notebook
โโโ Session4.ipynb # Lab session materials
โโโ cvae.py # CVAE model implementation
โโโ ccvae.py # CCVAE model implementation
โโโ trainer.py # Training script
โโโ utils.py # Utility functions (training, evaluation, visualization)
โโโ download.sh # Dataset download script
โโโ configs/ # Experiment configurations
โ โโโ CVAE1_KLD_0.001/
โ โโโ CVAE2_KLD_0.0/
โ โโโ CVAE3_KLD_0.01/
โ โโโ CVAE4_KLD_0.0001/
โ โโโ CVAE_new1_KLD_0.0001/
โ โโโ CCVAE1_KLD_0.0001/
โโโ data/
โ โโโ AFHQ/ # AFHQ dataset (downloaded)
โ โโโ train/
โ โโโ test/
โโโ models/ # Saved model checkpoints
โโโ imgs/ # Generated images and visualizations
โ โโโ CVAE1/ # Experiment outputs
โ โโโ CVAE2/
โ โโโ inference/ # Generated samples
โ โโโ ...
โโโ tboard_logs/ # TensorBoard log files
โ โโโ CVAE1_KLD_0.001/
โ โโโ ...
โโโ htmls/ # HTML exports of notebooks
๐ Analysis & Results
KL Divergence Weight Analysis
The ฮป_KLD parameter controls the trade-off between: - Reconstruction quality: Lower ฮป_KLD โ better reconstruction, but less regularized latent space - Latent space structure: Higher ฮป_KLD โ more structured latent space, but potentially worse reconstruction
Key Findings: 1. ฮป_KLD = 0.0: Acts as a pure autoencoder, excellent reconstruction but unstructured latent space 2. ฮป_KLD = 0.0001: Weak regularization, good reconstruction with some latent structure 3. ฮป_KLD = 0.001: Balanced trade-off (default) 4. ฮป_KLD = 0.01: Strong regularization, well-structured latent space but may sacrifice reconstruction quality
Latent Space Properties
- Disentanglement: Higher KL weights encourage more disentangled representations
- Interpolation: Well-regularized latent spaces enable smooth interpolation between samples
- Generation quality: Conditional VAEs enable class-specific generation with better control
๐ Usage
Setup
-
Install dependencies:
-
Download dataset:
This will download and extract the AFHQ dataset to./data/AFHQ/
Training a Model
Using the Training Script
from trainer import main
from cvae import CVAE
configs = {
"model_name": "CVAE",
"exp": "1",
"latent_dim": 64,
"batch_size": 64,
"num_epochs": 50,
"lr": 0.001,
"scheduler": "ReduceLROnPlateau",
"use_scheduler": True,
"lambda_kld": 0.001,
}
main(configs)
Using the Notebook
- Open
Assignment4.ipynbin Jupyter - Run cells sequentially to:
- Load and inspect the dataset
- Define and initialize models
- Train experiments with different configurations
- Evaluate models and visualize results
- Generate and analyze samples
Viewing TensorBoard Logs
Then open http://localhost:6006 in your browser to view:
- Training/validation loss curves
- Reconstruction vs KL divergence components
- Learning rate schedule
- Image reconstructions
Loading and Using Trained Models
import torch
from cvae import CVAE
from utils import load_model
# Initialize model
model = CVAE(latent_dim=64)
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
# Load checkpoint
model, optimizer, epoch, stats = load_model(
model, optimizer,
'models/CVAE1/checkpoint_KLD_0.001_epoch_49.pth'
)
# Generate samples
model.eval()
with torch.no_grad():
z = torch.randn(16, 64).to(device)
z = model.decoder_input(z)
z = z.view(-1, 128, 4, 4)
samples = model.decoder(z)
Conditional Generation (CCVAE)
from ccvae import CCVAE
model = CCVAE(latent_dim=64, num_classes=3)
# Generate samples for specific class (0=cat, 1=dog, 2=wildlife)
class_label = torch.tensor([0] * 16) # Generate 16 cat faces
samples = model.sample(num_samples=16, c=class_label)
๐ง Configuration Files
Each experiment has a YAML configuration file in configs/:
batch_size: 64
exp: '1'
lambda_kld: 0.001
latent_dim: 64
lr: 0.001
model_name: CVAE
num_epochs: 50
scheduler: ReduceLROnPlateau
use_scheduler: true
๐ Key Concepts
Variational Autoencoder
A VAE is a generative model that learns to encode data into a latent distribution and decode samples from that distribution. Unlike standard autoencoders, VAEs learn a probabilistic latent representation.
Reparameterization Trick
Enables backpropagation through random sampling:
This makes the sampling process differentiable.ELBO (Evidence Lower BOund)
The VAE objective function:
Maximizing ELBO is equivalent to maximizing the data likelihood while regularizing the latent distribution.KL Divergence
Measures how different the learned latent distribution q(z|x) is from the prior p(z) = N(0,I). Encourages the encoder to produce latent codes that match the standard normal distribution.
๐ References
- Auto-Encoding Variational Bayes (Kingma & Welling, 2014)
- AFHQ Dataset
- PyTorch VAE Tutorial
- TensorBoard Documentation
๐ฌ Support
If you found this project helpful, you can support my work by buying me a coffee or via paypal!
Location
The complete assignment documentation, code, and notebooks are located in:
This assignment demonstrates variational inference, generative modeling, and the trade-offs between reconstruction quality and latent space regularization in deep learning.