Skip to content

Repository Structure

This page provides links to the comprehensive documentation for each assignment and the course project. Each assignment directory contains a detailed README file with complete information about implementations, experiments, and usage.

Assignments

Assignment 1: Neural Network Fundamentals

Location: src/Assignment1/

Description: Building and training basic neural networks from scratch on the CIFAR-10 dataset. Implements and compares Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs), with experiments on regularization techniques and custom learning rate scheduling.

Documentation: See Assignment 1 README

Key Topics: - MLP and CNN architectures - Dropout regularization - Custom learning rate warmup scheduler - Model evaluation and analysis


Assignment 2: Transfer Learning and Fine-tuning

Location: src/Assignment2/

Description: Leveraging pre-trained deep learning models for a custom binary classification task (human vs robot). Explores different transfer learning strategies and compares multiple state-of-the-art architectures.

Documentation: See Assignment 2 README

Key Topics: - Transfer learning strategies (full fine-tuning, fixed feature extractor, combined approach) - Pre-trained models: ResNet18, ConvNeXt, EfficientNet - Vision transformers: DINOv2, SwinTransformer - Model comparison and evaluation


Assignment 3: Recurrent Neural Networks for Action Recognition

Location: src/Assignment3/

Description: Implementing Recurrent Neural Networks (RNNs) from scratch and applying them to action recognition tasks. Implements custom LSTM and Convolutional LSTM cells, then compares them with PyTorch's built-in RNN modules.

Documentation: See Assignment 3 README

Key Topics: - Custom LSTM and ConvLSTM implementations - Action recognition on KTH-Actions dataset - RNN architectures: LSTMCell, GRUCell, custom implementations - 3D-CNN (R(2+1)d-Net) for action classification (extra credit)


Assignment 4: Variational Autoencoders (VAEs)

Location: src/Assignment4/

Description: Implementing and training Variational Autoencoders (VAEs) for image generation and reconstruction. Implements both standard Convolutional VAEs (CVAE) and Conditional Convolutional VAEs (CCVAE) on the AFHQ dataset.

Documentation: See Assignment 4 README

Key Topics: - Convolutional VAE (CVAE) implementation - Conditional Convolutional VAE (CCVAE) with class conditioning - Reparameterization trick and ELBO optimization - KL divergence weight experiments - Latent space visualization and image generation


Assignment 5: Generative Adversarial Networks (GANs)

Location: src/Assignment5/

Description: Implementing and training Generative Adversarial Networks (GANs) for image generation. Implements two variants: DCGAN (Deep Convolutional GAN) for unconditional generation and CDCGAN (Conditional Deep Convolutional GAN) for class-conditional generation.

Documentation: See Assignment 5 README

Key Topics: - Fully convolutional Generator and Discriminator networks - Adversarial training dynamics - Unconditional and conditional image generation - Training stability and monitoring


Assignment 6: Siamese Networks and Contrastive Learning

Location: src/Assignment6/

Description: Implementing and comparing two self-supervised learning approaches for face recognition: TriNet Siamese Networks and SimCLR (Simple Framework for Contrastive Learning). Uses the Labeled Faces in the Wild (LFW) dataset.

Documentation: See Assignment 6 README

Key Topics: - TriNet Siamese model with triplet loss - SimCLR model with contrastive learning - ResNet-18 backbone with custom projection heads - Embedding visualization (PCA, t-SNE) - Face recognition and clustering


Assignment 7: Vision Transformers for Action Recognition

Location: src/Assignment7/

Description: Implementing and training Vision Transformers (ViT) for action recognition on the KTH-Actions dataset. Explores transformer-based architectures for video classification, comparing different patch sizes and evaluating performance against RNN models.

Documentation: See Assignment 7 README

Key Topics: - Vision Transformer (ViT) implementation - Patch-based processing for video frames - Multi-head self-attention mechanisms - Patch size ablation studies - Video Vision Transformer (ViViT) with Space-Time attention (extra credit)


Course Project

Video Prediction with Object Representations

Location: src/CourseProject/

Description: Advanced video prediction using transformer-based architectures. Implements a two-stage pipeline with both holistic and object-centric scene representations on the MOVi-C dataset.

Documentation: See Course Project README

Key Topics: - Two-stage training pipeline (autoencoder + predictor) - Holistic and object-centric scene representations - Transformer-based encoders and decoders - Hybrid CNN + Transformer architecture - Autoregressive prediction with sliding window mechanism


Project Organization

CudaVisionSS2025/
├── src/
│   ├── Assignment1/          # Neural Network Fundamentals
│   ├── Assignment2/          # Transfer Learning and Fine-tuning
│   ├── Assignment3/          # Recurrent Neural Networks
│   ├── Assignment4/          # Variational Autoencoders
│   ├── Assignment5/          # Generative Adversarial Networks
│   ├── Assignment6/          # Self-Supervised Learning
│   ├── Assignment7/          # Vision Transformers
│   └── CourseProject/        # Video Prediction Project
├── docs/                     # Documentation (this site)
├── requirements.txt          # Python dependencies
└── README.md                 # Main repository README

Each assignment directory contains: - README.md: Comprehensive documentation for the assignment - Notebooks: Jupyter notebooks with implementations and experiments - Source code: Python modules and utilities - Configs: Experiment configuration files - Models: Saved model checkpoints - Logs: TensorBoard logs and training outputs


Getting Started

  1. Clone the repository:

    git clone https://github.com/Cuda-Vision-Lab/CudaVisionSS2025.git
    cd CudaVisionSS2025
    

  2. Install dependencies:

    pip install -r requirements.txt
    

  3. Navigate to an assignment:

    cd src/Assignment1  # or any other assignment
    

  4. Read the assignment README: Each assignment directory contains a detailed README.md file with:

  5. Overview and objectives
  6. Dataset information
  7. Model architectures
  8. Training instructions
  9. Usage examples
  10. Results and analysis

For detailed information about each assignment, please refer to the respective README files linked above.