Batch Jobs & Slurm

Contents


THIS SITE IS DEPRECATED

We have transitioned to another service and are no longer actively updating this site.

Refer to our new documentation at: hpcdocs.hpc.arizona.edu

This page is under construction! We cannot guarantee completion or accuracy of information herein while in development.

Overview

What are batch jobs?

Some jobs don't need a GUI, may take a long time to run, and/or do not need user input. In these cases, batch jobs are useful because they allow a user to request resources for a job, then wait until it completes automatically without any further input. The user can even fully log off of HPC, and the submitted jobs will continue to run.  The program on HPC that takes requests and assigns resources at ideal times to optimize cluster usage is called a scheduler, and the name of our scheduler is Slurm. All three clusters, Puma, Ocelote, and ElGato, use SLURM for resource management and job scheduling.

Contrast with GUI Jobs

The main difference between batch and GUI jobs is that batch jobs are only text-based and give no graphical feedback during runtime. While there is a method to submit jobs using a GUI, strictly speaking, batch jobs are of a different nature than GUI jobs, such as ones that use R Studio or MATLAB.

Batch jobs are also different from interactive jobs, because while both use the command line interface and the slurm scheduler, there is no feedback or interactivity with batch jobs. The script is run exactly as submitted with no way to change it once it is submitted, though it can be canceled. Each batch job is assigned a unique job ID that can be used to trace it. 

Batch Workflow

The general process for submitting a batch job is as follows:

  1. Write your analysis. This requires an executable program as well as some input files or options. This varies widely between different types of analysis, and you will need to determine what needs to be done for your particular analysis.
  2. Write your Slurm batch script. This tells the scheduler what resources you want for your job and how to run it. The batch script is written in bash, and normal bash commands can be used within the batch script to increase functionality or flexibility.
  3. Submit your request. This is usually as simple as running "sbatch my_request.slurm"
  4. Wait. Now the scheduler has your request. It will compare your job to all currently waiting jobs and determine when will be the best time to run it. Jobs that request more resources generally wait longer in the queue, but there is no concrete rule that determines how long a given job will wait. Typical wait times vary by cluster and activity. Generally, jobs submitted to ElGato will start much sooner than jobs submitted to Puma, with Ocelote falling in between. To check on the activity of a given cluster, use "". To give some expectation, small jobs on Puma may start within 5 minutes, but large multinode jobs may wait days to begin.

To check the status of your jobs in the queue, use the command "squeue -u $USER". This will list relevant information, including a Slurm Reason Code, which describes the reason for the wait time. Sometimes, the job is just waiting for resources, other times you may have expended your group's monthly CPU time. 

Additional SLURM Resources and Examples

LinkDescription
Official SchedMD User DocumentationOfficial SchedMD user documentation. Includes detailed information on SLURM directives and commands.
PBS ⇔ SLURM Rosetta StoneTable for converting some common PBS job directives to SLURM syntax.
HPC Quick StartHPC Quick Start guide. If you have never submitted a batch job before, this is a great place to start.
Job ExamplesBasic SLURM example scripts. Includes PBS scripts for comparison. 
Even More Job Examples!Growing repository of example SLURM submission scripts
Intro to HPCA recorded video presentation of our Intro to HPC workshop. Keep your eyes peeled for periodic announcements in the HPC listserv on upcoming live sessions!=

Node Summary

Before submitting a Slurm script, you must know (or at least have a general idea) of the resources needed for your job. This will tell you which type of node to request, how much memory, and other useful information that can be provided to the system via your batch script. A detailed list of slurm batch flags are included below. 

General Overview

Node TypeDescription
Standard CPU NodeThis is the general purpose node, which can (and should) be used by the majority of jobs. 
High Memory CPU NodeSimilar to the standard nodes, but with significantly more RAM. There a only a few of them and they should only be requested if you have tested your job on a standard node and find that its memory usage is too high. Both standard and high memory nodes share the same file system, so there is no advantage in terms of long term storage, only active RAM usage.
GPU NodeSimilar to the standard node, but with one or more GPUs available, depending on which cluster is in use.

Hardware Limitations by Node Type and Cluster

Please consult the following table when crafting Slurm submission scripts. Requesting resources greater than what are available on a given cluster+node may lead to errors or delays.

ClusterNode TypeN NodesN CPU/ NodeRAM/CPUCPU RAM/ NodeN GPU/ NodeRAM/GPUGPU RAM/ NodeTotal N GPUs

Puma

Standard236945 gb470 gb----
High Mem3 standard9432 gb3008 gb----
2 buy-in

GPU

8 standard

94

5 gb

470 gb

4

32 gb

128 gb

32
7 buy-in28

Ocelote

Standard400286 gb168 gb----
High Mem14841 gb1968 gb----
GPU46288 gb224 gb116 gb16 gb46
El GatoStandard130164 gb62 gb----

See here for example Slurm requests.

Other Job Limits

In addition to fitting your jobs within the constraints of our hardware, there are other limitations imposed by the scheduler to maintain fair use. 

  • Time Limit Per Job: A single job cannot run for more than 10 days (240 hours). Requesting more time than this will lead to your job being stuck in the queue indefinitely with reason code "QOSMaxWallDurationPerJobLimit" (see below for more reason codes). 
  • CPU Hours Per Group: The number of CPU hours used per job is subtracted from the PI's allocation. More info here
  • Active jobs, CPUs, GPUs, and Memory: To see the limits and usage of these items, log onto the cluster you wish to know more about, and type "job-limits <group>"


Basic Considerations when making a Slurm Request

When requesting resources with Slurm, there are a few parameters and considerations to make. Below we include only the necessary information to start constructing a Slurm request. To view additional options, please view our Slurm Reference page.

Job Partitions

When submitting a job, a partition must be specified. This tells the scheduler from which allocation to subtract hours for this job. 

PartitionSLURM Details
standard#SBATCH --account=<PI GROUP>
#SBATCH --partition=standard
Consumes your group's standard allocation. These jobs cannot be interrupted.
windfall#SBATCH --partition=windfallDoes not consume your group's standard allocation. Jobs may be interrupted and restarted by higher-priority jobs. The --account flag needs to be omitted or an error will occur. 
high_priority#SBATCH --account=<PI GROUP>
#SBATCH --partition=high_priority
#SBATCH --qos=user_qos_<PI GROUP>
Available for groups who have purchased compute resources.
qualified#SBATCH --account=<PI GROUP>
#SBATCH --partition=standard
#SBATCH --qos=qual_qos_<PI GROUP>
Available for groups that have submitted a special project request.

Total Memory vs CPU Count

Interested in learning more about how memory and CPU count are related? Check out our YouTube video!

Job Memory and CPU Count are Correlated

The memory your job is allocated is dependent on the number of CPUs you request.

For example, on Puma standard nodes, you get 5G for each CPU you request. This means a standard job using 4 CPUs gets 5G/CPU × 4 CPUs = 20G of total memory. Each node has its own memory ratio that's dependent on its total memory ÷ total number of CPUs. A reference for all the node types, the memory ratios, and how to request each can be found in the Node Types/Example Resource Requests section above.

What Happens if My Memory and CPU Requests Don't Match?

Our systems are configured to try to help when your memory request does not match your CPU count.

For example, if you request 1 CPU and 470G of memory on Puma, the system will automatically scale up your CPU count to 94 to ensure that you get your full memory requirements. This does not go the other way, so if you request less memory than would be provided by your CPU count, no adjustments are made. If you omit the --memory flag entirely, the system will use the memory ratio for the standard nodes on that cluster.

Possible Problems You Might Encounter

  • Be careful when using --mem-per-cpu ratio. If you use a higher value than a standard node ratio, you may inadvertently wind up in queue for a high memory node. On Puma there are three of these machines available for standard jobs and only one on Ocelote. This means the wait times are frequently longer than those for standard nodes. If you notice your job is in queue much longer than you would expect, check your job using job-history to ensure the memory ratio looks correct.
  • Stick to using --ntasks=N and --cpus-per-task=M to request N × M CPUs. Using the flag -c N to request CPUs has been found to cause problems with memory requests and may inadvertently limit you to ~4MB of total memory.

Node Types/Example Resource Requests

Standard Nodes

ClusterMax CPUsMem/CPUMax Mem per NodeSample Request Statement
ElGato164gb62gb#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --mem-per-cpu=4gb
Ocelote286gb168gb

#SBATCH --nodes=1
#SBATCH --ntasks=28
#SBATCH --mem-per-cpu=6gb

Puma945gb470gb

#SBATCH --nodes=1
#SBATCH --ntasks=94
#SBATCH --mem-per-cpu=5gb

GPU Nodes

During the quarterly maintenance cycle on April 27, 2022 the ElGato K20s and Ocelote K80s were removed because they are no longer supported by Nvidia.

GPU jobs are requested using the generic resource, or --gres, SLURM directive. In general, the directive to request N GPUs will be of the form: --gres=gpu:N

ClusterMax CPUsMem/CPUMax MemSample Request Statement
Ocelote288gb224gb

#SBATCH --nodes=1
#SBATCH --ntasks=28
#SBATCH --mem-per-cpu=8gb
#SBATCH --gres=gpu:1

Puma1945gb470gb

#SBATCH --nodes=1
#SBATCH --ntasks=94
#SBATCH --mem-per-cpu=5gb
#SBATCH --gres=gpu:1

Up to four GPUs may be requested on Puma on a single GPU node with --gres=gpu:1, 2, 3, or 4


High Memory Nodes

When requesting a high memory node, include both the memory/CPU and constraint directives

ClusterMax CPUsMem/CPUMax MemSample Request Statement
Ocelote4841gb2015gb

#SBATCH --nodes=1
#SBATCH --ntasks=48
#SBATCH --mem-per-cpu=41gb
#SBATCH --constraint=hi_mem

Puma9432gb3000gb#SBATCH --nodes=1
#SBATCH --ntasks=94
#SBATCH --mem-per-cpu=32gb
#SBATCH --constraint=hi_mem