{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4.1: Tensor Parallelism\n", "\n", "**Filled notebook:** \n", "[](https://github.com/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/scaling/JAX/tensor_parallel_simple.ipynb)\n", "[](https://colab.research.google.com/github/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/scaling/JAX/tensor_parallel_simple.ipynb) \n", "\n", "**Author:** [Phillip Lippe](https://phlippe.github.io/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we will discuss tensor parallelism, another important parallelism strategy for training large-scale deep learning models. Similar to pipeline parallelism, tensor parallelism is a model parallelism strategy, which means that it focuses on parallelizing the model itself, rather than the data. The key difference between pipeline and tensor parallelism is how they split the model over devices. In pipeline parallelism, the model is split over devices along the sequence of layers (i.e. vertically), while in tensor parallelism, the model is split over devices along the feature dimensions (i.e. horizontally). Each device will then process a different subset of features, and the model's forward and backward passes will be split over devices accordingly. A short overview of the parallelism strategies is shown below.\n", "\n", "