Parallelization

This page is under construction! We cannot guarantee completion or accuracy of information herein while in development.

Overview

To take advantage of the full computing power that HPC has to offer, codes can be run in parallel to spread the workload across multiple CPUs and potentially grant significant improvements in performance. This is often easier said than done. Some codes are developed with parallelization in mind such that it can be as simple as calling mpirun, however, other codes may need to be modified or run in tandem with parallelization libraries such as mpi4py.

Contents

MPI Jobs

OpenMPI

For openmpi the important variables are set by default, so you do not need to include them in your scripts.

Default OpenMPI variables
export SBATCH_GET_USER_ENV=1
export OMPI_MCA_btl_openib_cpc_include=rdmacm
export OMPI_MCA_btl_openib_if_include=bnxt_re1
export OMPI_MCA_btl_openib_rroce_enable=1
export OMPI_MCA_btl=vader,self,openib
export OMPI_MCA_oob_tcp_if_include=eth1

Intel MPI

For Intel MPI, these variables are set for you:

module unload openmpi3 gnu8

If you're using Intel MPI with mpirun and are getting errors, try replacing mpirun -np $NPROCESSES with:

srun -n $NPROCESSES --mpi=pmi2

Parallel Work

To make proper use of a supercomputer, you will likely want to use the benefit of many cores.  Puma has 94 cores in each node available to Slurm.  The exception to that is running hundreds or thousands of jobs using High Throughput Computing.  

We have a training course which explains the concepts and terminology of parallel computing with some examples.  Introduction to Parallel Computing

This practical course in Parallel Analysis in R is also useful