HDL - Introduction to HyperParameter Tuning

Filled notebook: View on Github
Hyper-parameter tuning repository: View repository on Github
Authors: Samuele Papa


This tutorial is meant as a description of the structure and features of the template GitHub repository to perform large scale hyperparameter tuning on a SLURM-based cluster using a combination of Pytorch Lightning, Hydra, Ax, MLFlow and Submitit.

The template is not meant to be the definitive way hyperparameter tuning should be performed. Instead, it is meant to be a very good example from which to pick the elements and the structure that make most sense for your own future projects. For example, MLFlow is not very good at comparing images from multiple runs, and if a qualitative evaluation is necessary, then it would be a good idea to include Tensorboard as an additional logging library. The use of SLURM was dictated by the popularity of the system and the fact that it is in use on the surfsara cluster, but the same template would work for other systems. Many more considerations and adaptations could be made at the discretion of the researcher.

The structure

├── hyperparameter_searcher
│   ├── __init__.py
│   ├── config
│   │   ├── data
│   │   │   └── mnist_config.py
│   │   ├── launcher
│   │   │   └── launcher_config.py
│   │   ├── logging
│   │   │   └── logging_config.py
│   │   ├── model
│   │   │   └── mnist_module_config.py
│   │   ├── sweeper
│   │   │   └── sweeper_config.py
│   │   ├── trainer
│   │   │   └── trainer_config.py
│   │   ├── train_bayesian_config.py
│   │   └── train_grid_config.py
│   ├── data
│   │   ├── __init__.py
│   │   ├── dataloaders.py
│   │   └── mnist_datamodule.py
│   ├── loggers
│   │   ├── __init__.py
│   │   ├── loggers.py
│   │   └── mlflow_utils.py
│   ├── networks
│   │   ├── __init__.py
│   │   ├── components
│   │   │   ├── __init__.py
│   │   │   └── simple_dense_net.py
│   │   └── mnist_lightning_module.py
│   ├── utils
│   │   ├── __init__.py
│   │   └── io_utils.py
│   └── training_pipeline.py
├── scripts
│   ├── README.md
│   ├── hyperparameter_blueprint_bayesian.sh
│   └── hyperparameter_blueprint_grid.sh
├── tests
│   ├── __init__.py
│   └── tests_utils
│       ├── __init__.py
│       └── test_io_utils.py
├── environment.yml
├── .env
└── train.py

The main file of the whole repository is train.py in the root folder. From here the training procedure can run, and this is also the primary point that will be used for debugging the code. In the tests folder the eventual tests for the code being written will be placed, the scripts folder contains the scripts that we will be using for the hyperparameter search, and the hyperparameter-searcher folder is where the source code for all our experiments resides.

In this template, the source code for the experiments is divided into Python modules, that can be thought of as building components used to run the entire thing. Besides the modules, there is a file called training_pipeline.py which is where the training is defined, and that will be the only entry point to use the various modules from. In this template, only this file is present, however, often we will want to perform some additional analysis to our models after training, say creating nice visualizations with samples if we have a generative model, or computing a downstream task using the representations obtained. When this is the case, we will just need to add a new file, e.g. called evaluation.py, that will benefit from the already-defined modules and be used to load the model and run the evaluation.

The modules. In the experiment source code we have several modules. We can think of them as independent (as much as reasonable) packages that are used to perform a specific and complex task that can be re-used more than once or that is completely logically separate from other modules. This modular approach that separates based on function and not based on experiment, forces us to think of code that can be used immediately by all experiments we will be running, and is easier to maintain in the future.

Next, we will be discussing the different components that we are using for config management, logging and hyperparameter tuning.


Hydra configurations are usually defined through .yaml files. However, we can also define them manually using Python. By using Python-based config files we have more freedom in the definition of the configurations and in the re-usability of the code. The trade-off is more complexity in the management of the code, as all the configuration needs to be defined manually.

Hydra interfaces with your scripts using a decorator to the main function. This defines where to get the configs from, and which primary config file should be used to parse the arguments. In the case of this repositoru, the configuration is done through Python, so we don’t need the config_path:

@hydra.main(config_path=None, config_name=os.environ["MAIN_CONFIG"])

The primary config file is the entry point for your configuration. Here, you define the command line parameters that you support and which default configurations are to be used. A default configuration, as the name suggests, is a file that contains some default arguments. In our case, these will be handy for defining default configurations of the various datasets and models that we support.

From command line, all already-define parameters can be changed, as well as new ones added. By default, you need to explicitly ask for an argument to be added, if this is not already defined in the config.

Overall, using Hydra is a very straightforward way to neatly organize you configurations and get closer to reproducible results.

Some observations

Hydra is used to manage the configuration of your experiments. All command line arguments and their processing can be handled through it. The are several advantages over using the traditional argument parser from Python. The first one is that we can more easily store and restore argument configurations. Another is that target classes can be defined directly in the configuration. A target class can then be initialized with the arguments given in the configuration. Think of what would happen if you wanted to switch between using model_A and model_B which are defined with the class ModelA and ModelB. From the config, you would say model=model_A and then in the code you would need a long if statement chain to select which class to launch with the given configuration, which in this case would be ModelA. Then, there would be several default parameters for this class that we would want to use, but would be hidden in the code. Instead, with Hydra you can more simply define the target class directly in the config file, which will be automatically selected when model=model_A is called in the command line. This will come automatically with all the parameters defined explicitely in the config file.

This only makes sense because the modular approach to config management allows for simple parameter switching when testing different models, datasets or when running different experiments entirely. Having modular configuration management simplifies the entire file structure as well, removing the need for separating different experiments in different folders, which can quickly become difficult to maintain as time goes on. Instead, each part of your project can be seen as a different packages, one dedicated for data handling, one for model definition, one for the logging, another for visualization and maybe also one for all the metrics that you want to test your models with. Such modularity would be very difficult without also having modular configurations, which Hydra handles very easily.

One thing that is important to higlight is that Hydra is not some magical wand that we can use to solve all our problems. Instead, Hydra takes its root in the elegant configuration management that Omegaconf already provides. Hydra is a handy extension of Omegaconf, with some features tailored for machine learning. When more complicated things need to be done, do not hesitate to put your hands on what Hydra is doing and add your own code to make your workflow faster. Often, trying to work around the issue and use only the features available in the library slows you down more than you think.


MLFlow is a logging library, which is characterized by centralization of the logging, as to ease the process when multiple nodes are being used. Additionally, it provides a simplified way to compare the parameters that have been changed between runs, to get a quick overview of which change has made the most decise impact in the performance of the model.

MLFlow interfaces with the code through PyTorch Lightning. When the Trainer is instantiated, among the loggers passed is MLFlow. In the code, this will seem a bit opaque, as the initialization of the loggers is done through Hydra’s instantiate (in file training_pipeline.py):

mlflow_logger = hydra.utils.instantiate(logger)

Another important aspect is the checkpoint, which allows to store useful information and the model’s weights through MLFlow as well. Using MLFlow in the checkpointing process is useful to centralize all the information (in file training_pipeline.py):

    hydra.utils.instantiate(callback, mlflow_logger=mlflow_logger)

Some observations

MLFlow is an excellent logging tool to keep track of your experiments. Where MLFlow shines is in its ability to quickly compare multiple runs of an hyperparameter search. Also, it centralizes everything, putting all of the things you need in a single location, which is extremely handy when running large scale experiments.

There are a few downsides to MLFlow, as it is not excellent with image logging and is overall lacking in the ability to compare different metrics and the qualitative performance of different models. When this is necessary for your models (which is the often the case for computer vision tasks) MLFlow needs to be supported by additional loggers, such as Tensorboard. This is easily done with Pythorch Lightning, just asking for multiple loggers in the Trainer.


We have seen how we have used a combination of Hydra, MLFlow, SubmitIt, and the Ax plugin in Hydra to perform bayesian or grid hyperparameter searches in a SLURM-based cluster. We have seen how it interfaces with a simple project and have observed some of the strength and pitfalls of the methods.

It is crucial to remember that this setup is meant as a guide, to introduce the useful tools that you may want to use and how they interface together. Ultimately, the best fit for any case will be determined by the specific circumstances that you are facing.