Accessing Software


New HPC Documentation Website!

New documentation is coming that will replace our current Confluence website (the one you're viewing right now). We will be sending an announcement on when the site will go live. Interested in taking a peek? Check out this page for the beta version. Note: the URL is likely to change.


New GPUs on Ocelote!

We have recently added 22 new P100 GPUs to Ocelote. Need to request multiple GPUs on a node and you're finding Puma queue times too slow? You can now request two GPUs per node on Ocelote using --gres=gpu:2.

Overview

The policies regarding the installation of software are on this page.  In general, scientific software is installed as requested with the caveats noted in that section. There are over 100 software applications installed as modules so you should look there before submitting an installation request. As an alternative, you are always welcome to install your own software or other software in your file space.

Puma, Ocelote, and ElGato are built with CentOS 7 along with the system libraries, compilers and utilities that are needed for HPC operations. 

It is not uncommon that researchers can install necessary software for themselves using package managers such as conda. The following pages may be of use:

Contents

Available Software

Software packages are available as modules and are accessible from the compute nodes of any of our three clusters. They are not available on the login nodes so you will need to be in an interactive session to access them. A list of installed software can be seen by clicking the link below, but may not be as current as using the module avail command (more information under Module Commands).


 Click here to view a list of all software modules (updated periodically)


SoftwareDescription
AbaqusFinite element analysis
Abyss

Parallel assembler for short read sequence data

AdmixtureA software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets.
AmgXAmgX provides a simple path to accelerated core solver technology on NVIDIA GPUs. AmgX provides up to 10x acceleration to the computationally intense linear solver portion of simulations, and is especially well suited for implicit unstructured methods.
AmiraPowerful, multifaceted 2D–5D software for visualization, processing and analysis of microscopy imaging serving Life and Biomedical Sciences.
AnacondaPlatform for data science and machine learning
Anchorwave(Anchored Wavefront Alignment) identifies collinear regions via conserved anchors
AnsysLicensed:

General purpose finite element modeling package

Ant

JAVA build tool

ANTLRANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
AOCLAMD Optimized CPU Libraries. Includes BLIS, libFLAME, FFTW, LibM, ScaLAPACK
ArgtableAn ANSI C library for parsing GNU style command line options
Aria2A lightweight multi-protocol & multi-source command-line download utility
AtlasAutomatically Tuned Linear Algebra Software (ATLAS). C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK
AugustusGene prediction and is required for Maker
AutoconfGNU tool for producing configure scripts for building, installing and packaging software on computer systems where a Bourne shell is available
AutotoolsGNU suite of tools to make source code packages portable to many Unix-like systems
AWS CLICLI tool for interfacing with Amazon Web Services (AWS)
Bamtools

C++ API and toolkit for analyzing and managing BAM files

BbmapShort read aligner for DNA and RNA-seq data
BcftoolsA set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF
Beagle-libPhylogenetics - works with Beast
BeastBayesian analysis of molecular sequences
BedopsAn open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data
Bedtools

Utilities for comparing, summarizing and intersecting genomic features in the UCSC Genome Browser BED format

BiocontainersBiocontainers is a registry of Biology tools that can be pulled from a Docker container into a Singularity container.
BismarkA program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step
BisonGeneral purpose parser generator that converts a grammar description for an LALR(1) context-free grammar into a C program to parse that grammar.
Blas

Basic Linear Algebra Subprograms

Blast

Search tool that finds regions of local similarity between nucleotide or protein sequences

BlatAlignment tool like BLAST, but structured differently
Blender3D Visualization software
Boost

Peer reviewed C++ source libraries

Bowtie2

Ultrafast, memory-efficient tool for aligning sequencing reads to long reference sequences.

BWA

Fast light-weight tool that aligns relatively short sequences to a sequence database

CaffeDeep learning framework made with expression, speed, and modularity
Canu

Fork of the Celera Assembler designed for high-noise single-molecule sequencing

CASTEPCASTEP is a full-featured materials modelling code based on a first-principles quantum mechanical description of electrons and nuclei. It uses the robust methods of a plane-wave basis set and pseudopotentials.
CCPCisco Configuration Professional is a GUI based device management tool for Cisco access routers.
CentrifugeClassifier for metagenomic sequences.
CDOClimate Data Operator is a collection of command-line Operators to manipulate and analyze Climate and NWP model Data
CellRangerA set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering.
Cern-RootA modular scientific software framework
CfitsioA library of subroutines for data files. Available on each compute node not as a module.
ChimeraAn extensible molecular modeling system
Chapel

Programming language designed for productive parallel computing at scale

Clustal OmegaClustal Omega is a multiple sequence alignment program that uses seeded guide trees and HMM profile-profile techniques to generate alignments between three or more sequences
ClusterThe open source clustering software implements the most commonly used clustering methods for gene expression data analysis
ClusterShellClusterShell is an event-driven open source Python library, designed to run local or distant commands in parallel on server farms or on large Linux clusters
Cmake

Tool to control the compilation process. "cmake" comes with the operating system, but it is an old version 2.  Likely you should "module load cmake" which will get you a much newer release of version 3.

ComsolModeling and simulating physics-based problems (licensed)
Contig-extenderDeveloped to extend contigs, complementing de novo assembly.
ContribAdds user supported software to your module path. On Ocelote this is /unsupported
CootMacromolecular model building
CP2KOpen Source Molecular Dynamics
CplexIBM optimization models, combining leading solver engines with a tightly integrated IDE and modeling language
CrestAn IO-based scheduler for semiempirical quantum mechanical calculations at the GFNn-xTB level.
Cuda

Parallel computing platform and API model for Nvidia GPU's

Cufflinks

Assembles transcripts, estimates their abundances, tests for differential expression and regulation in RNA-Seq samples

CurlComputer software project providing a library and command-line tool for transferring data using various protocols
dealiideal.II is a C++ program library targeted at the computational solution of partial differential equations using adaptive finite elements
Diamond

Alignment tool for aligning short DNA sequencing reads to a protein reference database

DifxSoftware correlator used to process the simulated files
DMTCPA tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications.
EagleEstimates haplotype phase either within a genotyped cohort or using a phased reference panel
EggNOG-mapperA hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.
Eigenhigh-level C++ library of template headers
EnergyplusA whole building energy simulation program that engineers, architects, and researchers 
Exonerate

Generic tool for sequence comparison

FastmeAlgorithms to infer phylogenies
FastQC

Quality control tool for high throughput sequence data

FasttreeFor large alignments
FastX

Command line tools for Short-Reads FASTA/FASTQ files preprocessing

FFmpegA cross-platform solution to record, convert and stream audio and video
FFTWFast fourier transforms. Ocelote has multiple versions
FijiFiji is an image processing package—a “batteries-included” distribution of ImageJ2, bundling a lot of plugins which facilitate scientific image analysis.
FreeBayesBayesian genetic variant detector designed to find small polymorphisms
FreecQuantifying transcripts
FreesurferAnalysis of neuroimaging data
GamessA program for ab initio quantum chemistry
GATKIdentifying SNPs and indels in germline DNA from Broad Institute
Gaussian

Electronic structure program. Licensed for general use

GDALa computer software library for reading and writing raster and vector geospatial data formats
GenomeToolsThe versatile open source genome analysis software
GEOSGeometry Engine Open Source
GitVersion control system (VCS) for tracking changes in computer files and coordinating work on those files among multiple people.
GLPKThe GLPK (GNU Linear Programming Kit) package is intended for solving large-scale linear programming (LP) and mixed integer programming (MIP)
Gnuplot

Generates two- and three-dimensional plots of functions

GoProgramming language from Google
GotchaA library that provides function wrapping, interposing a wrapper function between a function and its callsites.
GraceGrace is a Motif application for two-dimensional data visualization. Grace can transform the data using free equations, FFT, cross- and auto-correlation
Gromacs

Molecular dynamics software primarily designed for biomolecular systems

GSL

The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers

GurobiMathematical problem solver for prescriptive analytics
HDF5

Data model, library, and file format for storing and managing data (built with GCC and Intel)

HISAT2A fast and sensitive alignment program for mapping next-generation sequencing reads.
Replaces TopHat
HMMER

Biosequence analysis using profile hidden Markov models

HotpantsHigh Order Transform of Psf ANd Template Subtraction code (hotpants)
hpctoolkit
hpctraceviewer
hpcviewer
Tools for measurement and analysis of program performance
htslibUnified C library for accessing common file formats. Also part of samtools
hwlocGathers information about parallel computing platforms so as to exploit them efficiently 
HYPRELibrary of linear solvers featuring parallel multigrid
IDL

Interactive Data Language, is a programming language used for data analysis, particularly in astronomy. Special case: restricted to licensed users

iGraphRoutines for simple graphs and network analysis
Intel CompilersLicensed compilers

Intel MPIIntel MPI (integrated from 2019 on)
Intel ToolkitIntel DAAL, GDB, IPP, MKL, TBB, Intel-Cluster (integrated from 2019 on)
IQ-TREEEfficient phylogenetic software 
iRodsClient - open source data management
JagsAnalysis of Bayesian hierarchical models using Markov Chain Monte Carlo simulation
JavaProgramming language
JellyfishFast, memory-efficient counting of k-mers in DNA. Used by Trinity
Julia

High-level, high-performance dynamic programming language for technical computing

JupyterJupyter notebooks are available at the web service OnDemand
KallistoQuantifying abundances of transcripts from RNA-Seq data
LAMMPS

Classical molecular dynamics code

LAMMPS KOKKOSAccelerator package for LAMMPS using data structures and macros provided by the Kokkos library
LAPACKNumerical linear algebra
LibmeshThe libMesh library provides a framework for the numerical simulation of partial differential equations using arbitrary unstructured discretizations
Libpnglibpng is the official PNG reference library. It supports almost all PNG features, is extensible, and has been extensively tested
LibtoolGeneric library building tool
LS-OPT (lsopt)Design optimization
MAFFT

Multiple sequence alignment program

Maker

Portable and easily configurable genome annotation pipeline

MathematicaLicensed: A single integrated, continually expanding system that covers the breadth and depth of technical computing
MATLAB

High-level language and interactive environment, performs computationally intensive tasks

MavenA build automation tool used primarily for Java projects
MCLA cluster algorithm for graphs
MemeSuite of motif-based sequence analysis tools
MetaPhlAnA computational tool for profiling the composition of microbial communities
MidnightcommanderA visual file manager
MigrateSoftware that estimates population parameters, effective population sizes and migration rates of n populations, using genetic data
MooseAn open-source parallel finite element framework
MothurSoftware for microbial biology
MrbayesProvides bayesian estimation of phylogeny
MPICH/2

Freely available, portable implementation of MPI. Renamed to MPICH

MVAPICH

Library exploiting novel features and mechanisms of high-performance networking technologies

MummerRapid whole genome alignment
NAMDMolecular dynamics. Cuda version is "namd-cuda"
NCBI-vdbA collection of tools and libraries for using data in the INSDC Sequence Read Archives.
NCDUNCurses Disk usage analyzer
NCL (ncl-ncarg)NCAR Command Language
NCOToolkit to manipulate and analyze data stored in netCDF format
NetCDF

Software libraries and self-describing, machine-independent data formats supporting the creation, access, and sharing of array-oriented scientific data

NetlogoA programmable modeling environment for simulating natural and social phenomena
NGS-SDK

A new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.
Works with HISAT2 and SRAToolkit

NUFEB

A massively parallel simulator for individual-based modelling of microbial communities

OHPCOpenHPC: Provides a variety of common, pre-built ingredients required to deploy and manage an HPC Linux cluster
OligoArrayAuxA subset of the UNAFold package for use with OligoArray
OpenblasAn optimized BLAS library
OpenFOAMComputational Fluid Dynamics software
OpenMPI

High performance message passing library

OrthoFinder

Accurate inference of orthogroups, orthologues, gene trees and rooted species

OvitoA visualization and analysis software for output data generated in molecular dynamics, atomistic Monte-Carlo and other particle-based simulations.
PandaseqA program to align illumina reads
PapiPerformance application programming interface
ParallelGNU Parallel is a shell tool for executing jobs in parallel
Paraview

ParaView is an open-source, multi-platform data analysis and visualization application.

ParflowA parallel integrated hydrology model
ParMETISAn MPI-based parallel library
Integrated in PetSc
Pasta

Practical Alignment using Sate and TrAnsitivity. This is installed to python/2.
module load python/2. Then pip list will display available modules 

PCREA set of functiAons that implement regular expression pattern matching using the same syntax and semantics as Perl 5.
PeridigmComputational peridynamics code from Sandia NL
PerlProgramming language
PETScA suite of data structures and routines developed by Argonne National Laboratory for the scalable (parallel) solution of scientific applications modeled by partial differential equations.  The default version is built with GCC and real arithmetic.  There are other modules built with Intel compilers and complex arithmetic support.
PGIPGI Compilers and Tools
phdf5A file format, library, and utility programs for efficiently managing large and complex datasets stored in files.
phenixA comprehensive software package for macromolecular structure determination using crystallographic (X-ray, neutron and electron) and electron cryo-microscopy
Photoscan-proPerforms photogrammetric processing of digital images and generates 3d spatial data
PicardCommand line tools for manipulating high-throughput sequencing
pkg-configA computer program that defines and supports a unified interface for querying installed libraries for the purpose of compiling software that depends on them
PlasmaParallel Linear Algebra Software for Multicore Architectures
Plink

Whole genome association analysis toolset

pmixA means of exchanging wireup information needed for interprocess communication
pnetcdfA high-performance parallel I/O library for accessing Unidata's NetCDF, files in classic formats, specifically the formats of CDF-1, 2, and 5.
ProjCartographic projections and coordinate transformations library
PythonObject-oriented programming language
We encourage the use of virtualenv to build your own environment.
QctoolSoftware to filter out samples or variants
Qiime

Quantitative Insights Into Microbial Ecology. Qiime2 is a package in Python3

Quantum-espressoMaterials modeling
R

Language and environment for statistical computing and graphs

RStudioRStudio is an IDE for R, available at OnDemand web services
RAxMLA program for sequential and parallel Maximum Likelihood-based inference of large phylogenetic trees
RelionProgram for Maximum A Posteriori refinement in cry-electron microscopy
RemoraResource Monitoring for Remote Applications
RepeatMasker

A program that screens DNA sequences for interspersed repeats and low complexity DNA sequences

Rings

"Rigorous Investigation of Networks Generated using Simulations" is a scientific code developed in Fortran90/MPI to analyze the results of molecular dynamics simulations

RmblastA RepeatMasker compatible version of the standard NCBI BLAST suite. The primary difference between this distribution and the NCBI distribution is the addition of a new program "rmblastn" for use with RepeatMasker and RepeatModeler
Root-CernA modular scientific software framework
Ruby

A dynamic, reflective, object-oriented, general-purpose programming language.

SagemathA free open-source mathematics software system
SalmonA quasi-mapping bioinformatics tool

SAMRAI

Library to explore application, numerical, parallel computing, and software issues associated with structured adaptive mesh refinement (SAMR)
SAMTools

Utilities for manipulating alignments in SAM format

SAS

Software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics.

SBTScala build tool
ScalaGeneral-purpose programming language
ScaLAPACKScalable LAPACK
SchrodingerLicensed: Molecular modeling and materials science.
SeqlogoPackage that takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo.
ShapeitEstimation of phasing for SNP sequencing data
SignalpPackage that predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes.
SiloA mesh and field I/O library and scientific database
SingularitySingularity containers let users run applications in a Linux environment of their choosing.
See tutorial information
SlimAn evolutionary simulation software package used for research and teaching
SNAPUsed by Maker
SOAPdenovo2A novel short-read assembly method
SPAdes

St. Petersburg genome assembler, for both standard isolates and single-cell MDA bacteria assemblies

Sparsehash

An extremely memory-efficient hash_map implementation

SparkFrom Apache.  Open-source distributed general-purpose cluster computing framework
SpectraC++ library for large scale eigenvalue problems
SpeedseqAn open-source genome analysis platform for rapid genome analysis and interpretation 
SpparksKinetic Monte Carlo simulator from Sandia
SRAtoolkitEnables reading of sequencing files from the SRA database. From NCBI
Stacks

Software pipeline for building loci from short-read sequences

StarRNA-seq Aligner
StarfusionUses the STAR aligner to identify candidate fusion transcripts
SuperLUA general purpose C (also callable from Fortran) library for the direct solution of large, sparse, nonsymmetric systems of linear equations
TmhmmPackage that predicts transmembrane helices in proteins.
TopHat

Fast splice junction mapper for RNA-Seq reads

TRF

Telomere Restriction Fragment (TRF) Analysis

TrilinosSet of solvers from Sandia National Labs
TrimmomaticA flexible trimmer for illumina sequence data
Trinity

Package which enables the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data

Trinotate

Annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes

VASP

Atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles. Special case: restricted to licensed users

VCFtools

Tool providing easily accessible methods for working with complex genetic variation data in the form of VCF files

Velvet

De novo genomic assembler specially designed for short read sequencing technologies

VisitInteractive parallel visualization and graphical analysis tool for viewing scientific data
VtuneIntel performance profiler (vtune_amplifier_xe)
WGSWhole Genome Shotgun Assembler for the reconstruction of genomic DNA sequence from WGS sequencing data
WhamWhole genome Alignment Metrics
Wien2KSoftware for electronic structure calculations using DFT
WRF
WPS 
Weather Research and Forecasting Model.
Special case: available to Hydrology and Atmospheric Sciences
XDMFeXtensible Data Model and Format
XzFree general-purpose data compression software with a high compression ratio.
zlib

A software library used for data compression.

** installed on the operating system of each node.  "module load xx" is not necessary.


Module Commands

If multiple versions of software are available on the system, the newest is made the default. This means loading a module without specifying the version will select the most recent. We strongly recommend including version information in your module statements. This ensures that you maintain a consistent environment for your analyses in the event of a software upgrade.

To see, access, and get information on individual software packages available on the system, use the module commands detailed below. If you have trouble accessing the module command, see our FAQ section.

Module CommandDescription

$ module avail

 Display all the software and versions installed on the system

$ module avail module_name

Display all installed versions of the software "module_name" 

$ module list

Display the software you have loaded in your environment

$ module what-is module_name

Displays some descriptive information about a specific module

$ module load module_name

Load a software module in your environment
$ module unload module_nameUnload a specific software package from your environment

$ module purge

 Unload all the software modules from your environment

$ module help

 Display a help menu for the module command

Compilers

Puma, Ocelote, and El Gato all run CentOS7 and have the following compilers available:

CompilerVersionModule Command
Intel 2020.1$ module load intel/2020.1
Intel2020.4$ module load intel/2020.4
gcc5.4.0

$ module load gnu/5.4.0

gcc7.3.0$ module load gnu7/7.3.0
gcc8.3.0$ module load gnu8/8.3.0
This one is loaded by default

Common Software Packages

VSCode Remote Connection

Using R Packages

Create your own custom libraries where you can install and manage packages locally.

Using Python & Python Packages

Different versions of Python are available on HPC, both as system modules as well as system software on each compute node. Python 2 is available but is no longer supported by the Python Foundation, so we recommend you use Python 3. Python version 3 requires the python3 command or pip3 to differentiate. It is very different from Python version 2, so do not assume that Python 3 will work for you or that all older modules will work with version 3.

Using Perl

One of the best things about Perl is the number of packages provided by the user community. Installing packages generally requires root access but that is not a viable solution in the HPC environment.

Using Matlab

This page provides information on using Matlab on HPC and diagnosing common problems.

Installing Software

While users cannot add or update system software or libraries using tools that require root privileges such as yum, many software packages can be installed locally without needing to be a superuser. Frequently linux packages make use of the "configure, make, make install" method and an example of how to do this is shown under Example Installation on this page.

Common Datasets

Installing additional software

To submit a request to have software installed on the UArizona HPC systems, use our HPC Software Install Request form. There is no expected time frame for how long it takes to install software, there are many variables that determine this. If you haven't heard back in a week, it is reasonable for you to follow up with a support ticket

You may also install software packages into the space that is allocated to you with your HPC account.  However, you cannot install software that requires root permission or use a method like "yum install" that accesses system paths. For information on installing software locally, see our online guide for an example.