Software Resources

Overview

Puma, Ocelote, and ElGato are built on CentOS 7 which is the foundation for all compilers, libraries, and applications available on the systems. The same software applications are available on all three supercomputers as they are presented from a unified filesystem.

Many popular software packages are installed and available to the user community as software modules. The short explanation is that loading a module configures your environment so that you do not need to know where it is installed and can begin using the software immediately as soon as it's loaded. 

Frequently, different versions of the same software are available as different modules. If you load software without specifying the version, the default in most cases is the latest. For example, "module load julia" will provide Julia version 1.8.5 (at the time of writing), whereas "module load julia/1.6.1" will make an older version available.

Most software that needed to be compiled has used GCC version 8.3. It was the latest available when the software builds were started in 2020. Some other packages are built with the Intel compiler. Both are available to compile your own code. Note that GCC is loaded by default along with OpenMPI 3 (named gnu8 and openmpi3 respectively).

You may request that additional applications be installed if they have general usefulness.



Installed Software

  • The list here is usually incomplete because of the frequent changes.
  • Some modules are in the operating system also, but are too downlevel, like curl or libtool.
  • The current list of modules is available on each compute node with the command "module avail". Not on login nodes.
  • Module commands are available here.

SoftwareDescription
AbaqusFinite element analysis
Abyss

Parallel assembler for short read sequence data

AdmixtureA software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets.
AmgXAmgX provides a simple path to accelerated core solver technology on NVIDIA GPUs. AmgX provides up to 10x acceleration to the computationally intense linear solver portion of simulations, and is especially well suited for implicit unstructured methods.
AmiraPowerful, multifaceted 2D–5D software for visualization, processing and analysis of microscopy imaging serving Life and Biomedical Sciences.
AnacondaPlatform for data science and machine learning
Anchorwave(Anchored Wavefront Alignment) identifies collinear regions via conserved anchors
AnsysLicensed:

General purpose finite element modeling package

Ant

JAVA build tool

ANTLRANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
AOCLAMD Optimized CPU Libraries. Includes BLIS, libFLAME, FFTW, LibM, ScaLAPACK
ArgtableAn ANSI C library for parsing GNU style command line options
Aria2A lightweight multi-protocol & multi-source command-line download utility
AtlasAutomatically Tuned Linear Algebra Software (ATLAS). C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK
AugustusGene prediction and is required for Maker
AutoconfGNU tool for producing configure scripts for building, installing and packaging software on computer systems where a Bourne shell is available
AutotoolsGNU suite of tools to make source code packages portable to many Unix-like systems
AWS CLICLI tool for interfacing with Amazon Web Services (AWS)
Bamtools

C++ API and toolkit for analyzing and managing BAM files

BbmapShort read aligner for DNA and RNA-seq data
BcftoolsA set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF
Beagle-libPhylogenetics - works with Beast
BeastBayesian analysis of molecular sequences
BedopsAn open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data
Bedtools

Utilities for comparing, summarizing and intersecting genomic features in the UCSC Genome Browser BED format

BiocontainersBiocontainers is a registry of Biology tools that can be pulled from a Docker container into a Singularity container.
BismarkA program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step
BisonGeneral purpose parser generator that converts a grammar description for an LALR(1) context-free grammar into a C program to parse that grammar.
Blas

Basic Linear Algebra Subprograms

Blast

Search tool that finds regions of local similarity between nucleotide or protein sequences

BlatAlignment tool like BLAST, but structured differently
Blender3D Visualization software
Boost

Peer reviewed C++ source libraries

Bowtie2

Ultrafast, memory-efficient tool for aligning sequencing reads to long reference sequences.

BWA

Fast light-weight tool that aligns relatively short sequences to a sequence database

CaffeDeep learning framework made with expression, speed, and modularity
Canu

Fork of the Celera Assembler designed for high-noise single-molecule sequencing

CASTEPCASTEP is a full-featured materials modelling code based on a first-principles quantum mechanical description of electrons and nuclei. It uses the robust methods of a plane-wave basis set and pseudopotentials.
CCPCisco Configuration Professional is a GUI based device management tool for Cisco access routers.
CentrifugeClassifier for metagenomic sequences.
CDOClimate Data Operator is a collection of command-line Operators to manipulate and analyze Climate and NWP model Data
CellRangerA set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering.
Cern-RootA modular scientific software framework
CfitsioA library of subroutines for data files. Available on each compute node not as a module.
ChimeraAn extensible molecular modeling system
Chapel

Programming language designed for productive parallel computing at scale

Clustal OmegaClustal Omega is a multiple sequence alignment program that uses seeded guide trees and HMM profile-profile techniques to generate alignments between three or more sequences
ClusterThe open source clustering software implements the most commonly used clustering methods for gene expression data analysis
ClusterShellClusterShell is an event-driven open source Python library, designed to run local or distant commands in parallel on server farms or on large Linux clusters
Cmake

Tool to control the compilation process. "cmake" comes with the operating system, but it is an old version 2.  Likely you should "module load cmake" which will get you a much newer release of version 3.

ComsolModeling and simulating physics-based problems (licensed)
Contig-extenderDeveloped to extend contigs, complementing de novo assembly.
ContribAdds user supported software to your module path. On Ocelote this is /unsupported
CootMacromolecular model building
CP2KOpen Source Molecular Dynamics
CplexIBM optimization models, combining leading solver engines with a tightly integrated IDE and modeling language
CrestAn IO-based scheduler for semiempirical quantum mechanical calculations at the GFNn-xTB level.
Cuda

Parallel computing platform and API model for Nvidia GPU's

Cufflinks

Assembles transcripts, estimates their abundances, tests for differential expression and regulation in RNA-Seq samples

CurlComputer software project providing a library and command-line tool for transferring data using various protocols
dealiideal.II is a C++ program library targeted at the computational solution of partial differential equations using adaptive finite elements
Diamond

Alignment tool for aligning short DNA sequencing reads to a protein reference database

DifxSoftware correlator used to process the simulated files
DMTCPA tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications.
EagleEstimates haplotype phase either within a genotyped cohort or using a phased reference panel
EggNOG-mapperA hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.
Eigenhigh-level C++ library of template headers
EnergyplusA whole building energy simulation program that engineers, architects, and researchers 
Exonerate

Generic tool for sequence comparison

FastmeAlgorithms to infer phylogenies
FastQC

Quality control tool for high throughput sequence data

FasttreeFor large alignments
FastX

Command line tools for Short-Reads FASTA/FASTQ files preprocessing

FFmpegA cross-platform solution to record, convert and stream audio and video
FFTWFast fourier transforms. Ocelote has multiple versions
FijiFiji is an image processing package—a “batteries-included” distribution of ImageJ2, bundling a lot of plugins which facilitate scientific image analysis.
FreeBayesBayesian genetic variant detector designed to find small polymorphisms
FreecQuantifying transcripts
FreesurferAnalysis of neuroimaging data
GamessA program for ab initio quantum chemistry
GATKIdentifying SNPs and indels in germline DNA from Broad Institute
Gaussian

Electronic structure program. Licensed for general use

GDALa computer software library for reading and writing raster and vector geospatial data formats
GenomeToolsThe versatile open source genome analysis software
GEOSGeometry Engine Open Source
GitVersion control system (VCS) for tracking changes in computer files and coordinating work on those files among multiple people.
GLPKThe GLPK (GNU Linear Programming Kit) package is intended for solving large-scale linear programming (LP) and mixed integer programming (MIP)
Gnuplot

Generates two- and three-dimensional plots of functions

GoProgramming language from Google
GotchaA library that provides function wrapping, interposing a wrapper function between a function and its callsites.
GraceGrace is a Motif application for two-dimensional data visualization. Grace can transform the data using free equations, FFT, cross- and auto-correlation
Gromacs

Molecular dynamics software primarily designed for biomolecular systems

GSL

The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers

GurobiMathematical problem solver for prescriptive analytics
HDF5

Data model, library, and file format for storing and managing data (built with GCC and Intel)

HISAT2A fast and sensitive alignment program for mapping next-generation sequencing reads.
Replaces TopHat
HMMER

Biosequence analysis using profile hidden Markov models

HotpantsHigh Order Transform of Psf ANd Template Subtraction code (hotpants)
hpctoolkit
hpctraceviewer
hpcviewer
Tools for measurement and analysis of program performance
htslibUnified C library for accessing common file formats. Also part of samtools
hwlocGathers information about parallel computing platforms so as to exploit them efficiently 
HYPRELibrary of linear solvers featuring parallel multigrid
IDL

Interactive Data Language, is a programming language used for data analysis, particularly in astronomy. Special case: restricted to licensed users

iGraphRoutines for simple graphs and network analysis
Intel CompilersLicensed compilers

Intel MPIIntel MPI (integrated from 2019 on)
Intel ToolkitIntel DAAL, GDB, IPP, MKL, TBB, Intel-Cluster (integrated from 2019 on)
IQ-TREEEfficient phylogenetic software 
iRodsClient - open source data management
JagsAnalysis of Bayesian hierarchical models using Markov Chain Monte Carlo simulation
JavaProgramming language
JellyfishFast, memory-efficient counting of k-mers in DNA. Used by Trinity
Julia

High-level, high-performance dynamic programming language for technical computing

JupyterJupyter notebooks are available at the web service OnDemand
KallistoQuantifying abundances of transcripts from RNA-Seq data
LAMMPS

Classical molecular dynamics code

LAMMPS KOKKOSAccelerator package for LAMMPS using data structures and macros provided by the Kokkos library
LAPACKNumerical linear algebra
LibmeshThe libMesh library provides a framework for the numerical simulation of partial differential equations using arbitrary unstructured discretizations
Libpnglibpng is the official PNG reference library. It supports almost all PNG features, is extensible, and has been extensively tested
LibtoolGeneric library building tool
LS-OPT (lsopt)Design optimization
MAFFT

Multiple sequence alignment program

Maker

Portable and easily configurable genome annotation pipeline

MathematicaLicensed: A single integrated, continually expanding system that covers the breadth and depth of technical computing
MATLAB

High-level language and interactive environment, performs computationally intensive tasks

MavenA build automation tool used primarily for Java projects
MCLA cluster algorithm for graphs
MemeSuite of motif-based sequence analysis tools
MetaPhlAnA computational tool for profiling the composition of microbial communities
MidnightcommanderA visual file manager
MigrateSoftware that estimates population parameters, effective population sizes and migration rates of n populations, using genetic data
MooseAn open-source parallel finite element framework
MothurSoftware for microbial biology
MrbayesProvides bayesian estimation of phylogeny
MPICH/2

Freely available, portable implementation of MPI. Renamed to MPICH

MVAPICH

Library exploiting novel features and mechanisms of high-performance networking technologies

MummerRapid whole genome alignment
NAMDMolecular dynamics. Cuda version is "namd-cuda"
NCBI-vdbA collection of tools and libraries for using data in the INSDC Sequence Read Archives.
NCDUNCurses Disk usage analyzer
NCL (ncl-ncarg)NCAR Command Language
NCOToolkit to manipulate and analyze data stored in netCDF format
NetCDF

Software libraries and self-describing, machine-independent data formats supporting the creation, access, and sharing of array-oriented scientific data

NetlogoA programmable modeling environment for simulating natural and social phenomena
NGS-SDK

A new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.
Works with HISAT2 and SRAToolkit

NUFEB

A massively parallel simulator for individual-based modelling of microbial communities

OHPCOpenHPC: Provides a variety of common, pre-built ingredients required to deploy and manage an HPC Linux cluster
OligoArrayAuxA subset of the UNAFold package for use with OligoArray
OpenblasAn optimized BLAS library
OpenFOAMComputational Fluid Dynamics software
OpenMPI

High performance message passing library

OrthoFinder

Accurate inference of orthogroups, orthologues, gene trees and rooted species

OvitoA visualization and analysis software for output data generated in molecular dynamics, atomistic Monte-Carlo and other particle-based simulations.
PandaseqA program to align illumina reads
PapiPerformance application programming interface
ParallelGNU Parallel is a shell tool for executing jobs in parallel
Paraview

ParaView is an open-source, multi-platform data analysis and visualization application.

ParflowA parallel integrated hydrology model
ParMETISAn MPI-based parallel library
Integrated in PetSc
Pasta

Practical Alignment using Sate and TrAnsitivity. This is installed to python/2.
module load python/2. Then pip list will display available modules 

PCREA set of functiAons that implement regular expression pattern matching using the same syntax and semantics as Perl 5.
PeridigmComputational peridynamics code from Sandia NL
PerlProgramming language
PETScA suite of data structures and routines developed by Argonne National Laboratory for the scalable (parallel) solution of scientific applications modeled by partial differential equations.  The default version is built with GCC and real arithmetic.  There are other modules built with Intel compilers and complex arithmetic support.
PGIPGI Compilers and Tools
phdf5A file format, library, and utility programs for efficiently managing large and complex datasets stored in files.
phenixA comprehensive software package for macromolecular structure determination using crystallographic (X-ray, neutron and electron) and electron cryo-microscopy
Photoscan-proPerforms photogrammetric processing of digital images and generates 3d spatial data
PicardCommand line tools for manipulating high-throughput sequencing
pkg-configA computer program that defines and supports a unified interface for querying installed libraries for the purpose of compiling software that depends on them
PlasmaParallel Linear Algebra Software for Multicore Architectures
Plink

Whole genome association analysis toolset

pmixA means of exchanging wireup information needed for interprocess communication
pnetcdfA high-performance parallel I/O library for accessing Unidata's NetCDF, files in classic formats, specifically the formats of CDF-1, 2, and 5.
ProjCartographic projections and coordinate transformations library
PythonObject-oriented programming language
We encourage the use of virtualenv to build your own environment.
QctoolSoftware to filter out samples or variants
Qiime

Quantitative Insights Into Microbial Ecology. Qiime2 is a package in Python3

Quantum-espressoMaterials modeling
R

Language and environment for statistical computing and graphs

RStudioRStudio is an IDE for R, available at OnDemand web services
RAxMLA program for sequential and parallel Maximum Likelihood-based inference of large phylogenetic trees
RelionProgram for Maximum A Posteriori refinement in cry-electron microscopy
RemoraResource Monitoring for Remote Applications
RepeatMasker

A program that screens DNA sequences for interspersed repeats and low complexity DNA sequences

Rings

"Rigorous Investigation of Networks Generated using Simulations" is a scientific code developed in Fortran90/MPI to analyze the results of molecular dynamics simulations

RmblastA RepeatMasker compatible version of the standard NCBI BLAST suite. The primary difference between this distribution and the NCBI distribution is the addition of a new program "rmblastn" for use with RepeatMasker and RepeatModeler
Root-CernA modular scientific software framework
Ruby

A dynamic, reflective, object-oriented, general-purpose programming language.

SagemathA free open-source mathematics software system
SalmonA quasi-mapping bioinformatics tool

SAMRAI

Library to explore application, numerical, parallel computing, and software issues associated with structured adaptive mesh refinement (SAMR)
SAMTools

Utilities for manipulating alignments in SAM format

SAS

Software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics.

SBTScala build tool
ScalaGeneral-purpose programming language
ScaLAPACKScalable LAPACK
SchrodingerLicensed: Molecular modeling and materials science.
SeqlogoPackage that takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo.
ShapeitEstimation of phasing for SNP sequencing data
SignalpPackage that predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes.
SiloA mesh and field I/O library and scientific database
SingularitySingularity containers let users run applications in a Linux environment of their choosing.
See tutorial information
SlimAn evolutionary simulation software package used for research and teaching
SNAPUsed by Maker
SOAPdenovo2A novel short-read assembly method
SPAdes

St. Petersburg genome assembler, for both standard isolates and single-cell MDA bacteria assemblies

Sparsehash

An extremely memory-efficient hash_map implementation

SparkFrom Apache.  Open-source distributed general-purpose cluster computing framework
SpectraC++ library for large scale eigenvalue problems
SpeedseqAn open-source genome analysis platform for rapid genome analysis and interpretation 
SpparksKinetic Monte Carlo simulator from Sandia
SRAtoolkitEnables reading of sequencing files from the SRA database. From NCBI
Stacks

Software pipeline for building loci from short-read sequences

StarRNA-seq Aligner
StarfusionUses the STAR aligner to identify candidate fusion transcripts
SuperLUA general purpose C (also callable from Fortran) library for the direct solution of large, sparse, nonsymmetric systems of linear equations
TmhmmPackage that predicts transmembrane helices in proteins.
TopHat

Fast splice junction mapper for RNA-Seq reads

TRF

Telomere Restriction Fragment (TRF) Analysis

TrilinosSet of solvers from Sandia National Labs
TrimmomaticA flexible trimmer for illumina sequence data
Trinity

Package which enables the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data

Trinotate

Annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes

VASP

Atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles. Special case: restricted to licensed users

VCFtools

Tool providing easily accessible methods for working with complex genetic variation data in the form of VCF files

Velvet

De novo genomic assembler specially designed for short read sequencing technologies

VisitInteractive parallel visualization and graphical analysis tool for viewing scientific data
VtuneIntel performance profiler (vtune_amplifier_xe)
WGSWhole Genome Shotgun Assembler for the reconstruction of genomic DNA sequence from WGS sequencing data
WhamWhole genome Alignment Metrics
Wien2KSoftware for electronic structure calculations using DFT
WRF
WPS 
Weather Research and Forecasting Model.
Special case: available to Hydrology and Atmospheric Sciences
XDMFeXtensible Data Model and Format
XzFree general-purpose data compression software with a high compression ratio.
zlib

A software library used for data compression.

** installed on the operating system of each node.  "module load xx" is not necessary.