Batch Jobs & Slurm
THIS SITE IS DEPRECATED
We have transitioned to another service and are no longer actively updating this site.
Refer to our new documentation at: hpcdocs.hpc.arizona.edu
This page is under construction! We cannot guarantee completion or accuracy of information herein while in development.
Overview
What are batch jobs?
Some jobs don't need a GUI, may take a long time to run, and/or do not need user input. In these cases, batch jobs are useful because they allow a user to request resources for a job, then wait until it completes automatically without any further input. The user can even fully log off of HPC, and the submitted jobs will continue to run. The program on HPC that takes requests and assigns resources at ideal times to optimize cluster usage is called a scheduler, and the name of our scheduler is Slurm. All three clusters, Puma, Ocelote, and ElGato, use SLURM for resource management and job scheduling.
Contrast with GUI Jobs
The main difference between batch and GUI jobs is that batch jobs are only text-based and give no graphical feedback during runtime. While there is a method to submit jobs using a GUI, strictly speaking, batch jobs are of a different nature than GUI jobs, such as ones that use R Studio or MATLAB.
Batch jobs are also different from interactive jobs, because while both use the command line interface and the slurm scheduler, there is no feedback or interactivity with batch jobs. The script is run exactly as submitted with no way to change it once it is submitted, though it can be canceled. Each batch job is assigned a unique job ID that can be used to trace it.
Batch Workflow
The general process for submitting a batch job is as follows:
- Write your analysis. This requires an executable program as well as some input files or options. This varies widely between different types of analysis, and you will need to determine what needs to be done for your particular analysis.
- Write your Slurm batch script. This tells the scheduler what resources you want for your job and how to run it. The batch script is written in bash, and normal bash commands can be used within the batch script to increase functionality or flexibility.
- Submit your request. This is usually as simple as running "
sbatch my_request.slurm
" - Wait. Now the scheduler has your request. It will compare your job to all currently waiting jobs and determine when will be the best time to run it. Jobs that request more resources generally wait longer in the queue, but there is no concrete rule that determines how long a given job will wait. Typical wait times vary by cluster and activity. Generally, jobs submitted to ElGato will start much sooner than jobs submitted to Puma, with Ocelote falling in between. To check on the activity of a given cluster, use "". To give some expectation, small jobs on Puma may start within 5 minutes, but large multinode jobs may wait days to begin.
To check the status of your jobs in the queue, use the command "squeue -u $USER". This will list relevant information, including a Slurm Reason Code, which describes the reason for the wait time. Sometimes, the job is just waiting for resources, other times you may have expended your group's monthly CPU time.
Additional SLURM Resources and Examples
Link | Description |
---|---|
Official SchedMD User Documentation | Official SchedMD user documentation. Includes detailed information on SLURM directives and commands. |
PBS ⇔ SLURM Rosetta Stone | Table for converting some common PBS job directives to SLURM syntax. |
HPC Quick Start | HPC Quick Start guide. If you have never submitted a batch job before, this is a great place to start. |
Job Examples | Basic SLURM example scripts. Includes PBS scripts for comparison. |
Even More Job Examples! | Growing repository of example SLURM submission scripts |
Intro to HPC | A recorded video presentation of our Intro to HPC workshop. Keep your eyes peeled for periodic announcements in the HPC listserv on upcoming live sessions!= |
Node Summary
Before submitting a Slurm script, you must know (or at least have a general idea) of the resources needed for your job. This will tell you which type of node to request, how much memory, and other useful information that can be provided to the system via your batch script. A detailed list of slurm batch flags are included below.
General Overview
Node Type | Description |
---|---|
Standard CPU Node | This is the general purpose node, which can (and should) be used by the majority of jobs. |
High Memory CPU Node | Similar to the standard nodes, but with significantly more RAM. There a only a few of them and they should only be requested if you have tested your job on a standard node and find that its memory usage is too high. Both standard and high memory nodes share the same file system, so there is no advantage in terms of long term storage, only active RAM usage. |
GPU Node | Similar to the standard node, but with one or more GPUs available, depending on which cluster is in use. |
Hardware Limitations by Node Type and Cluster
Please consult the following table when crafting Slurm submission scripts. Requesting resources greater than what are available on a given cluster+node may lead to errors or delays.
Cluster | Node Type | N Nodes | N CPU/ Node | RAM/CPU | CPU RAM/ Node | N GPU/ Node | RAM/GPU | GPU RAM/ Node | Total N GPUs |
Puma | Standard | 236 | 94 | 5 gb | 470 gb | - | - | - | - |
High Mem | 3 standard | 94 | 32 gb | 3008 gb | - | - | - | - | |
2 buy-in | |||||||||
GPU | 8 standard | 94 | 5 gb | 470 gb | 4 | 32 gb | 128 gb | 32 | |
7 buy-in | 28 | ||||||||
Ocelote | Standard | 400 | 28 | 6 gb | 168 gb | - | - | - | - |
High Mem | 1 | 48 | 41 gb | 1968 gb | - | - | - | - | |
GPU | 46 | 28 | 8 gb | 224 gb | 1 | 16 gb | 16 gb | 46 | |
El Gato | Standard | 130 | 16 | 4 gb | 62 gb | - | - | - | - |
See here for example Slurm requests.
Other Job Limits
In addition to fitting your jobs within the constraints of our hardware, there are other limitations imposed by the scheduler to maintain fair use.
- Time Limit Per Job: A single job cannot run for more than 10 days (240 hours). Requesting more time than this will lead to your job being stuck in the queue indefinitely with reason code "QOSMaxWallDurationPerJobLimit" (see below for more reason codes).
- CPU Hours Per Group: The number of CPU hours used per job is subtracted from the PI's allocation. More info here.
- Active jobs, CPUs, GPUs, and Memory: To see the limits and usage of these items, log onto the cluster you wish to know more about, and type "job-limits <group>"
Basic Considerations when making a Slurm Request
When requesting resources with Slurm, there are a few parameters and considerations to make. Below we include only the necessary information to start constructing a Slurm request. To view additional options, please view our Slurm Reference page.
Job Partitions
When submitting a job, a partition must be specified. This tells the scheduler from which allocation to subtract hours for this job.
Partition | SLURM | Details |
---|---|---|
standard | #SBATCH --account=<PI GROUP> #SBATCH --partition=standard | Consumes your group's standard allocation. These jobs cannot be interrupted. |
windfall | #SBATCH --partition=windfall | Does not consume your group's standard allocation. Jobs may be interrupted and restarted by higher-priority jobs. The --account flag needs to be omitted or an error will occur. |
high_priority | #SBATCH --account=<PI GROUP> #SBATCH --partition=high_priority #SBATCH --qos=user_qos_<PI GROUP> | Available for groups who have purchased compute resources. |
qualified | #SBATCH --account=<PI GROUP> #SBATCH --partition=standard #SBATCH --qos=qual_qos_<PI GROUP> | Available for groups that have submitted a special project request. |
Total Memory vs CPU Count
Interested in learning more about how memory and CPU count are related? Check out our YouTube video!
Job Memory and CPU Count are Correlated
The memory your job is allocated is dependent on the number of CPUs you request.
For example, on Puma standard nodes, you get 5G for each CPU you request. This means a standard job using 4 CPUs gets 5G/CPU × 4 CPUs = 20G of total memory. Each node has its own memory ratio that's dependent on its total memory ÷ total number of CPUs. A reference for all the node types, the memory ratios, and how to request each can be found in the Node Types/Example Resource Requests section above.
What Happens if My Memory and CPU Requests Don't Match?
Our systems are configured to try to help when your memory request does not match your CPU count.
For example, if you request 1 CPU and 470G of memory on Puma, the system will automatically scale up your CPU count to 94 to ensure that you get your full memory requirements. This does not go the other way, so if you request less memory than would be provided by your CPU count, no adjustments are made. If you omit the --memory flag entirely, the system will use the memory ratio for the standard nodes on that cluster.
Possible Problems You Might Encounter
- Be careful when using
--mem-per-cpu
ratio. If you use a higher value than a standard node ratio, you may inadvertently wind up in queue for a high memory node. On Puma there are three of these machines available for standard jobs and only one on Ocelote. This means the wait times are frequently longer than those for standard nodes. If you notice your job is in queue much longer than you would expect, check your job usingjob-history
to ensure the memory ratio looks correct. - Stick to using
--ntasks=N
and--cpus-per-task=M
to request N × M CPUs. Using the flag-c N
to request CPUs has been found to cause problems with memory requests and may inadvertently limit you to ~4MB of total memory.
Node Types/Example Resource Requests
Standard Nodes
Cluster | Max CPUs | Mem/CPU | Max Mem per Node | Sample Request Statement |
---|---|---|---|---|
ElGato | 16 | 4gb | 62gb | #SBATCH --nodes=1 |
Ocelote | 28 | 6gb | 168gb |
|
Puma | 94 | 5gb | 470gb |
|
GPU Nodes
During the quarterly maintenance cycle on April 27, 2022 the ElGato K20s and Ocelote K80s were removed because they are no longer supported by Nvidia.
GPU jobs are requested using the generic resource, or --gres
, SLURM directive. In general, the directive to request N GPUs will be of the form: --gres=gpu:N
Cluster | Max CPUs | Mem/CPU | Max Mem | Sample Request Statement |
---|---|---|---|---|
Ocelote | 28 | 8gb | 224gb |
|
Puma1 | 94 | 5gb | 470gb |
|
1 Up to four GPUs may be requested on Puma on a single GPU node with --gres=gpu:1, 2, 3, or 4 |
High Memory Nodes
When requesting a high memory node, include both the memory/CPU and constraint directives
Cluster | Max CPUs | Mem/CPU | Max Mem | Sample Request Statement |
---|---|---|---|---|
Ocelote | 48 | 41gb | 2015gb |
|
Puma | 94 | 32gb | 3000gb | #SBATCH --nodes=1 #SBATCH --ntasks=94 #SBATCH --mem-per-cpu=32gb #SBATCH --constraint=hi_mem |