Guide 2: Research projects with PyTorch

  • Based on some feedback I got, we will try to summarize tips and tricks on how to setup and structure large research projects in PyTorch, such as your Master Thesis

  • Feel free to contribute yourself if you have good ideas

Setup

Framework

  • Choosing the right framework can be essential. If you have standard optimization loops of a single forward pass and return a loss, consider going with PyTorch Lightning. It reduces the code overhead a lot and allows to easily scale your model to multiple GPUs and/or nodes if needed. Nonetheless, if you expect that you need to change the default training procedure quite a bit, consider going with plain PyTorch and write your own framework. It might take more time initially, but makes edits in the optimization procedure easier.

  • For an own framework, the following can be used as an example setup:

general/
│   train.py
│   task.py
│   mutils.py
layers/
experiments/
│   task1/
│        train.py
│        task.py
│        eval.py
│        dataset.py
│   task2/
│        train.py
│        task.py
│        eval.py
│        dataset.py
  • The general/train.py file summarizes the default operations every model needs (training loop, loading/saving model, setting up model, etc.). If you use PyTorch Lightning, this reduces to a train file per task, and only needs the specification of the trainer object.

  • The general/task.py file summarizes a template for the specific parts you have to do for a task (training step, validation step, etc.). If you use PyTorch Lightning, this would be the definition of the Lightning Module.

  • The layers/models folder contains the code for specifying the nn.Modules you use for setting up the model.

  • The experiments folder contains the task-specific code. Each task has its own train.py for specifying the argument parser, setting up the model, etc., while the task.py overwrites the template in general/task.py. The eval.py file should has as input a checkpoint directory of a trained model, and should evaluate this model on the test dataset. Finally, the file dataset.py contains all parts you need for setting up the dataset.

  • Note that this template assumes that you might have multiple different tasks and multiple different models. If you have a simpler setup, you can inherently shrink the template together.

Argument parser

  • It is a good practice to use argument parsers for specifying hyperparameters. Argument parsers allow you to call a training like python train.py --learning ... --seed ... --hidden_size ... etc.

  • If you have multiple models to choose from, you will have multiple set of hyperparameters. A good summary on that can be found in the PyTorch Lightning documentation without the need of using Lightning. In essence, you can define a static method for each model that returns a parser for its specific hyperparameters. This makes your code cleaner and easier to define new tasks without copying the whole argument parser.

  • To ensure reproducibility (more details below), it is recommended to save the arguments as a json file or similar in your checkpoint folder.