Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Section Column
width15%
Image Removed
Column
width85%

Table of Contents

Banner
imagehttps://public.confluence.arizona.edu/download/attachments/93162565/nextflow_logo_deep.png?api=v2
actionTitleNextflow Documentation
actionUrlhttps://www.nextflow.io/docs/latest/


Panel
borderColor#07105b
bgColor#fafafe

Overview

Table of Contents
maxLevel1
excludeOverview
typeflat
separatorpipe

Nextflow is a data-centric workflow management tool which facilitates complex and reproducible scientific computational workloads.

A workflow pattern including the following characteristics might be successfully realized by using Nextflow at UArizona HPC:

  • It contains significant complexity in the form of different applications, data formats, repetition, conditional branching, and/or dependencies between all of these
  • The maximum scaling you project will require is low
  • Fast workflow turnaround is not a priority for you
  • You already possess a system implemented with Nextflow



Panel
borderColor#07105b
bgColor#fafafe

Pros and Cons

Strengths of Nextflow

Disadvantages of Nextflow

:

  • Simple installation and basic execution
  • No persistent database needed
  • Powerful ability to express complex tasks and relationships between tasks
Disadvantages of Nextflow:
  • Easy to unintentionally create a configuration which degrades scheduler performance for all HPC users
  • Weak integration with Slurm that is unable to adapt to SLURM scheduler policies. This can lead to long queue waiting times for tasks and excessive turnaround for workflows as a whole
  • Poor ability to manage limitations of HPC resources when running workflows which will need to run tasks over several days or longer
  • Nextflow makes a lot of duplicate and intermediate file copies as it operates; processing large amounts of data can easily exhaust storage quotas
  • Nextflow uses parallel file systems to synchronize tasks, which is a frequent source of undesired behavior
How to use Nextflow using SLURM




Panel
borderColor#07105b
bgColor#fafafe

Using Nextflow With SLURM

Hello World

Start with the Nextflow documentation Quick Start.

The system Java installation provided by default on Puma is sufficient to run Nextflow.

This documentation only discusses specifics of running Nextflow on Puma ; for guidance expressing your workflows with Nextflow scripts, see the official Nextflow documentation.

For

 For the most basic proof of concept demonstration, first choose a location

and install with the command   to Slurm
Code Block
curl -s

, start an interactive session, load the Nextflow module, and run nextflow run hello:

Code Block
languagebash
themeMidnight
[netid@cpu11 ~]$ module load nextflow
[netid@cpu11 ~]$ nextflow run hello
N E X T F L O W  ~  version 22.10.7
Pulling nextflow-io/hello ...
 downloaded from https://
get
github.com/nextflow-io/hello.git
Launching `https://github.com/nextflow-io
| bash

To run the pre-installed demonstration use this command:

Code Block
./nextflow run hello.
Wrapped Nextflow submission
/hello` [insane_lorenz] DSL2 - revision: 1d71f857bb [master]
executor >  local (4)
[50/b2f3c8] process > sayHello (4) [100%] 4 of 4 ✔
Ciao world!

Bonjour world!

Hello world!

Hola world!


[netid@cpu11 ~]$ 

Batch Submission

Executor: local

The easiest method is placing the

./

nextflow ... command into a batch script and submitting it to

Slurm

SLURM with sbatch. The manager process will run on the allocated compute node, and all tasks are configured to use the local executor; it's even possible to use srun in your processes to run tasks which include MPI applications.

The major benefit of this method, besides simplicity, is only the initial submission waits in a

Slurm

SLURM queue; it is a good pattern for a workflow which includes a very large number of small tasks. One should not combine this method with the Nextflow '

Slurm

slurm' executor because the job running the Nextflow manager is likely to end before the requested task is finished waiting in a queue.

There are two significant caveats

A drawback to running the Nextflow workflow process directly inside a

Slurm

SLURM job allocation

:The

is the workflow cannot run longer than the maximum wall time available to a single job in the

Slurm

SLURM QOS being used (240:00:00). This can be partially mitigated by using multiple

Slurm

SLURM submissions in series and passing the -resume flag to Nextflow, but only progress for completely finished tasks will be preserved from one submission to the next.

Nextflow Submits Tasks as Slurm Jobs

Nextflow configuration

Executor: slurm

The file nextflow.config is used to instruct the manager process to submit its tasks to

Slurm

SLURM instead of running them on the local host. Place the following file in your Nextflow working directory:

Code Block
params {
config_profile_description = "UArizona_HPC_PUMA"
}

singularity {
enabled = true
autoMounts = true
}

process {
executor = "slurm"
cpus = '24'
queue = 'standard'
time = '48h'
clusterOptions = '--account=bjoyce3 --job-name=nf-core
' memory = '96 GB'
 --mem-per-cpu=5gb --cpus-per-task=24'
}

Left to the default configuration, a running Nextflow workflow manager process can generate a disruptive amount of communication requests to

Slurm

SLURM; most of these configuration file entries reduce the frequency of those requests.

Inside the individual process definitions in your scripts, you will likely wish to override the clusterOptions variable to request specific resources appropriate for that task. This can be done by adding something in the pattern of clusterOptions='-q regular -t 05:30:00 -C knl' to the top of your task process blocks.

Note that we're excluding memory as an option from the process. If memory is included (e.g. memory = '96 GB') this will create a SLURM script that is improperly formatted and will result in a job that is restricted to a very small amount of memory (on the order of MB). Removing the Nextflow memory option and instead specifying --mem-per-cpu in clusterOptions will fix this issue.