Details
The Four A100 GPUs on Puma Node r5u13n1 are each subdivided into 3 smaller virtual GPUs using the Nvidia MIG (Multi-Instance GPU) method. Each of these MIG slices allows the use of 20 GB of GPU memory. The vast majority of jobs run on Puma in 2023 used less than this amount of GPU memory. The 12 MIG GPUs will increase overall GPU availability on Puma by freeing the 32 GB V100 GPUs for users requiring larger amounts of GPU memory.
Jobs requesting MIG resources will ideally be scheduled more quickly than those requesting the standard V100 GPUs, so MIG resources should be preferred when sufficient.
A limitation is that only one MIG slice can be addressed by a single application, so MIG slices are not appropriate for jobs utilizing multiple GPUs.
Using MIG resources
The addition of the MIG devices to the SLURM queues will have a number of impacts, and some users may need to make changes to submissions to ensure proper functioning of analyses.
BATCH JOBS:
Typically, in batch scripts N GPUs are requested by including the Slurm directive
#SBATCH --gres=gpu:N
With the inclusion of MIG GPU resources, the GPU resources allocated by the above directive may be either MIG slices or V100 GPUs, depending on which resources are available.
There are two conditions under which users will need to explicitly request the use of V100 (Volta) GPU resources:
1. If the 20 GB of GPU memory provided by a MIG slice is insufficient for your analysis
2. If the job requires more than one GPU
V100 GPUs may be requested using the Slurm directive:
#SBATCH --gres=gpu:volta:N
To explicitly request a MIG slice, use the directive:
#SBATCH --gres=gpu:nvidia_a100_80gb_pcie_2g.20gb
INTERACTIVE JOBS
Open on Demand
The Open on Demand job request form includes a dropdown menu that allows the choice of appropriate GPU resources. On Puma these include selections for a MIG slice and for 1-4 V100 GPUs.
Picture of OOD dropdown goes here
Interactive command line
The command “interactive” does not currently allow the selection of MIG resources, but a MIG slice can be requested with an salloc command similar to this:
salloc --gres=gpu:nvidia_a100_80gb_pcie_2g.20gb --job-name=interactive --mem-per-cpu=#GB --nodes=1 --ntasks=# --time=### --partition=<partition name> --account=<account name>