Using R Packages

Overview

Note: because software modules are not available on the login nodes, you will need to be on a compute node to load R and install packages.

R is a popular language for data analysis and visualization. Different versions are available as software modules and we provide the graphical interface RStudio for R through our Open OnDemand web interface.

Similar to other languages that use package managers to install libraries contributed by the user community, we recommend you create and manage your own local libraries in your account. This ensures a stable global environment for all users and that you have the most control over your packages' versions and dependencies.

We provide instructions below for how to create, use, and switch between libraries as well as some debugging techniques for when package installations fail. We also provide some script examples (click the button in the banner at the top of this page) for submitting R scripts as batch jobs.

RStudio is a popular method for running analyses (and for good reason!), but for longer-running jobs (say, many hours or days) or workflows that need more flexibility in their environment (e.g., need access to software installed as system modules such as gdal), we recommend batch submissions.

Contents


Creating a Custom R Library

 R packages can be finicky. See Switching Between Custom Libraries and Common Problems below to help with frequent user issues.

Creating Your First Library

  1. Make a local directory to store your packages:

    $ mkdir -p ~/R/library
  2. Tell R where the directory is by creating an environment file:

    $ echo 'R_LIBS=~/R/library/' >> ~/.Renviron
  3. That's it! Now you can install packages normally. For example, to install and load the package "ggplot2":

    $ module load R
    $ R
    > install.packages("ggplot2")
    > library(ggplot2)

Switching Between Custom Libraries

If you're using different versions of R, we recommend you use different libraries. See Common Problems below for more information. When creating a library, consider including pertinent information in the name such as R version. For example:

If you start by using R version 4.0, following the instructions provided above:

$ mkdir -p ~/R/library_R_v4.0
$ echo 'R_LIBS=~/R/library_R_v4.0/' >> ~/.Renviron

If you later decide to switch to R version 4.1, instead of using your existing library, create a new one:

$ mkdir -p ~/R/library_R_v4.1

To use your new library, edit your .Renviron file:

$ nano ~/.Renviron                # opens the R environment file for editing
> R_LIBS=~/R/library_R_v4.1       # delete ~/R/library_R_v4.0 and enter the new directory
> CONTROL + X                     # exits (remember to save at the prompt)

Now you can go about your business and install as you normally would.


Common Problems and How to Debug Them

Working on a cluster without root privileges can lead to complications. For general information on package installations, see the r-bloggers documentation. For information on common installation problems on our clusters, see the section below with with suggested solutions:

 Anaconda

Anaconda

One common reason R packages won't install is an altered environment. This can frequently be caused by the presence Anaconda (or Miniconda) installed locally or initialized in your account from our system module.

When Anaconda is initialized, your .bashrc file is edited so that it becomes the first thing in your PATH variable. This can cause all sorts of mayhem. To get around this, you can either remove anaconda from your PATH and deactivate your environment, or comment out/delete the initialization in your ~/.bashrc if you want the change to be permanent.

Turn off Auto-activation

Anaconda's initialization will tell it to automatically activate itself when you log in (when anaconda is active, you will see a "(conda)" preceding your command prompt). To disable this behavior, run the following from the command line in an interactive terminal session:

conda config --set auto_activate_base false

This will suppress anaconda's activation until you explicitly call conda activate and is a handy way to have more control over your environment. Once you run this, you will either need to log out and log back in again to make the changes live, or you can follow the instructions in the section below. 

Sometimes turning off auto-activation won't be enough because Anaconda will still be present in your PATH. Follow the instructions below to completely remove conda from your environment.

Temporary Removal

You can either use the command conda deactivate and then manually edit your PATH variable to remove all instances of anaconda/miniconda or copy the following and run it in your terminal:

conda deactivate > /dev/null 2>&1
IFS=':' read -ra PATHAR <<< "$PATH"
for i in "${PATHAR[@]}"
    do if [[ $i == *"conda"* ]]
        then echo "removing $i from PATH"
    else NEWPATH=$i:$NEWPATH
    fi
done
export PATH=$NEWPATH
module unload gnu8 && module load gnu8
unset NEWPATH
echo "Successfully removed conda" 

Permanent Removal

Your .bashrc file configures your environment each time you start a new session. You may consider making a backup before editing in case of unwanted changes.

Note: this change will remove anaconda from all future terminal sessions but will not make the changes live right away. To make the changes live, either follow the instructions above under Temporary Removal for removing anaconda from your PATH, or log out and back in again.

$ nano ~/.bashrc # opens your bashrc file to edit

Then comment out or delete the following lines and the text in between:

# >>> conda initialize >>>
...
# <<< conda initialize <<<

To exit and save, use control+x and follow the prompts.

 A Corrupted Environment

A Corrupted Environment

If Anaconda is not initialized in your account, there might be other culprits that are corrupting your environment.

Look for any of the file types listed below on your account. If you find them, remove them (make a backup somewhere if you need them) and try the installation again.

  1. Saved R sessions. If this is the case, after starting a session, you will get the message "[Previously saved workspace restored]". Old sessions are saved as a hidden file .RData in your home directory. 
  2. Gnu compilers
  3. Windows files
 Library Issues

Library Issues

Have you set up a custom library? Are you switching between custom libraries? You may want to check that everything is being loaded from the correct location and that there are not multiple or unwanted libraries being used.

Double-check that you have an .Renviron file. This is a hidden file located in your home directory and should set the path to your custom R library. If you do not have a custom library name set up, R will create one for you saved as something like:

~/R/x86_64-pc-linux-gnu-library

This directory can lead to unwanted behavior. For example, if you're trying to use a new custom library (such as when switching R version), R will still search x86_64-pc-linux-gnu-library for package dependencies and may cause installs to fail. To fix this, rename these types of folders something unique and descriptive.

To set up/switch custom libraries, follow the instructions in the Creating a Custom R Library section above.

 Mixing R Versions

Mixing R Versions

Because HPC is a cluster where multiple versions of R are available, users should take care to avoid mixing and matching. Because packages often depend on one another, libraries using different versions of R can turn into a tangled mess.  Common errors that can crop up include: "Error: package or namespace load failed."

If you're switching R versions, we recommend creating a new library.

 Open OnDemand RStudio Issues

Open OnDemand RStudio Issues

RStudio is a great tool! Sometimes though, because it's a different environment than working directly from the terminal, you may run into problems. Specifically, these typically arise for installs or when using packages that rely on software modules.

Package Installations

If you're trying to install a package in an OOD RStudio session and you've tried all the troubleshooting advice above without luck, try starting R in the terminal and give the installation another try. You can access an R session in the terminal by first starting an interactive session, then using:

$ module load R/<version>
$ R
> install.packages("package_name")

Note that if you're using RStudio for your analyses, you must use the R/4.0 module in the terminal for your packages to be usable/accessible from RStudio.

Accessing Modules

RStudio does not have access to module load commands. This means that if you have a package that relies on a system module, the easiest option is to work through an interactive terminal session or to submit a batch script. 

The alternative is to to modify your RStudio environment. For example, the library hdf5r relies on the hdf5 software module. If you try to load hdf5r, you will get an error complaining about a shared object file. To get around this, you will need to manually add that shared object to your environment using dyn.load(). For example:

> library("hdf5r") # without using dyn.load()
Error: package or namespace load failed for ‘hdf5r’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/home/u21/sarawillis/R/lib_4.0/hdf5r/libs/hdf5r.so':
  libhdf5_hl.so.100: cannot open shared object file: No such file or directory
> dyn.load("/opt/ohpc/pub/libs/gnu8/hdf5/1.10.5/lib/libhdf5_hl.so.100")
> library("hdf5r") # success!
>

This requires that you know the location of the relevant file(s). These can usually be tracked down by looking at your system path variables (e.g. LD_LIBRARY_PATH) after loading the relevant module in a terminal. It should be noted that modifying your system paths from RStudio will not help since RStudio has its own configuration file that overrides these. 

Font Issues

RStudio uses Singularity under the hood. As a result, there are some environment differences that may affect correct font formatting in images generated in RStudio. If you are experiencing this, add the following line to the hidden file ~/.Renviron in your account (you can create this file if it does not exist):

FONTCONFIG_PATH=/opt/ohpc/pub/apps/fontconfig/2.14.2/etc/fonts
 Linking Third Party Software

Linking Third Party Software

When packages are dependent on 3rd party software, particularly when the software is installed locally, R can have trouble finding it. This can usually be fixed by changing your environment paths but can sometimes be challenging. Some things to try:

  1. Packages that require 3rd party software should be installed in a terminal session and not through an OOD RStudio session.
  2. Check whether the software you need is installed as a module using the module avail command.
  3. If you know which paths need to be changed, point them to the correct location. 
  4. Search online help forums such as R-Bloggers, Stack Exchange, Stack Overflow, etc. for your specific error. It's likely others have experienced the same problem you're encountering and know where the trouble spots are.
  5. If you're in too deep, reach out to the consultants with a support ticket.
  6. Sometimes it can't be helped and you need the software installed as a module. Submit a software installation request to get things up and running. Note: there is no expected timeframe for software requests.


Using RStudio

How to Access RStudio

 Open OnDemand Application

Open OnDemand Application

We provide access to the popular development environment RStudio through our Open OnDemand web interface. This is a very handy tool, though it should be noted that it is a less flexible environment than using R from the command line. This is because RStudio sets its own environment which prevents easy access to third party software installed as system modules. These issues can sometimes worked around by following the guide in the debugging section above.

 Singularity - RStudio Server

Using Singularity

In some circumstances, you may want to run RStudio using your own Singularity image. For example, this allows access to different versions of R not provided when using our OOD application. We have some instructions on how to do this below.

First, log into HPC using an Open OnDemand Desktop session and open a terminal. A Desktop session is the easiest solution to access RStudio since it eliminates the need for port forwarding.

In the terminal, make an RStudio directory where all of the necessary files will be stored. In this example, we'll be working in our home directory and will pull an RStudio image from Dockerhub to use as a test. If you're interested, you can find different RStudio images under rocker in Dockerhub.

Make an RStudio directory and pull your image
mkdir $HOME/RStudio
cd $HOME/RStudio 
singularity pull ./geospatial.sif docker://rocker/geospatial.sif

Next, create the necessary directories RStudio will use to generate temporary files. You will also generate a secure cookie key.

Make RStudio temp directories and generate secure cookie key
TMPDIR=$HOME/RStudio/rstudio-tmp
mkdir -p $TMPDIR/tmp/rstudio-server
uuidgen > $TMPDIR/tmp/rstudio-server/secure-cookie-key
chmod 600 $TMPDIR/tmp/rstudio-server/secure-cookie-key
mkdir -p $TMPDIR/var/{lib,run}

Next, create a file in your RStudio directory called rserver.sh and make it an executable:

Create rserver.sh and modify permissions
touch rserver.sh
chmod u+x rserver.sh

Open the file in your favorite editor and enter the content below. Modify the variables under USER OPTIONS to match your account if necessary. You can change PASSWORD to any password you'd like to use. Once you've entered the contents, save and exit:

rserver.sh
#!/bin/bash

# --- USER OPTIONS --- #
WD=$HOME/RStudio
SIFNAME=geospatial.sif
PASSWORD="PASSWORD"

# --- SERVER STARTUP EXECUTED BELOW --- #
NETID=$(whoami)
TMPDIR=$WD/rstudio-tmp
SIF=$WD/$SIFNAME
PASSWORD=$PASSWORD singularity exec -B $TMPDIR/var/lib:/var/lib/rstudio-server -B $TMPDIR/var/run:/var/run/rstudio-server  -B $TMPDIR/tmp:/tmp $SIF rserver --auth-none=0 --auth-pam-helper-path=pam-helper --server-user=$NETID --www-address=127.0.0.1

Now, in your desktop session's terminal, execute the rserver.sh script using:

Execute rserver.sh
./rserver.sh

Next, open a Firefox window and enter "localhost:8787" for the URL. In your browser, you will be prompted to log into your RStudio server. Enter your NetID under Username. Under Password, enter the password you defined in the script server.sh.

This will open your RStudio session:


Setting a New User State Directory

When working on a large project in RStudio, it is possible for your R session's data to fill up your home directory resulting in out-of-space errors (e.g. when trying to edit files, create new OOD sessions, etc). With the newest version of RStudio, you can find these saved session files under ~/.local/share/rstudio.

To preserve space in your home, you can specify a different directory by setting the environment variable RSTUDIO_DATA_HOME. To do this, open the hidden file ~/.bashrc and add:

export RSTUDIO_DATA_HOME=/path/to/new/directory

where /path/to/new/directory is the path to a different location where you have a larger space quota. For example, /groups/YOUR_PI/YOUR_NETID/rstudio_sessions.


Setting Your Working Directory

 Current Session

Current Session

If you'd like to change your working directory in an RStudio session, one option is to use setwd("/path/to/directory") in your terminal. Alternatively, if you'd like to see the contents of your new workspace in your file browser, you can navigate to the Session dropdown tab, navigate to Set Working Directory, and click Choose Directory...

From there, either navigate to the desired subdirectory, or click the ellipsis (...) in the upper right to enter the full path to a directory.

The ellipsis option allows for more flexibility such as pointing to an /xdisk or /groups space.

Once you click OK and then Choose in the main file navigation window, R will change its working directory and you should see the contents of your new space under the Files browser in the lower right.

 All Non-Project Sessions

All Non-Project Sessions

If you'd like to permanently set a different default working directory for all non-project RStudio sessions, navigate to the Tools dropdown tab and select Global Options...

This will open a menu where you can set your default working directory under General. Click the Browse... button to open a file navigator

To select a new working directory, either navigate to the subdirectory of your current working space, or select the ellipsis (...) in the upper right to allow you to enter the full path.

The ellipsis option allows for more flexibility such as pointing to an /xdisk or /groups space.

Next, click OK, then Choose in the Choose Directory window, then Apply in the Global Options menu. This will set your working directory for your current session as well as all future sessions.


Example R Scripts

We have examples for running R workflows in batch on our Github Examples page. Have a question or suggestion? Let us know!


Popular Packages

 Seurat and SeuratDisk

Seurat and SeuratDisk

If you use RStudio for your analyses, make sure that you load the same version of R when working with modules on the command line.

To install Seurat and SeuratDisk, you'll need to be in an interactive terminal session and not in an RStudio session. This is because these libraries depend on software modules that RStudio doesn't have access to (see Common Problems  OOD RStudio Issues above for more information).

You will also need to make sure Anaconda is completely removed from your environment prior to the install. If you have Anaconda initialized in your account, see the code block under Resolving Anaconda IssuesTemporary Removal above.

Seurat

Seurat Installation
(elgato) [netid@junonia ~]$ interactive -a your_group
[netid@cpu38 ~]$ module load R/<version> 
[netid@cpu38 ~]$ module load gdal glpk libpng # software modules that are needed for Seurat's dependencies
[netid@cpu38 ~]$ R
> install.packages("Seurat")

If you want to load this software in an RStudio session, you will first need to use the following dyn.load commands. When using the dyn.loads in RStudio, you will need to be careful to run them in the order shown below, otherwise you may wind up with "Undefined symbol" errors. If you repeatedly run into library errors working in RStudio, you might consider converting your workflow to a batch script that you can submit through the command line. See the section Example R Scripts above for more information.

Loading Seurat in RStudio
> dyn.load("/opt/ohpc/pub/apps/glpk/5.0/lib/libglpk.so.40")
> dyn.load("/opt/ohpc/pub/apps/gdal/3.3.2/lib/libgdal.so.29")
> dyn.load("/opt/ohpc/pub/apps/proj/7.2.1/lib/libproj.so.19")
> library(Seurat)

SeuratDisk

SeuratDisk is similar to Seurat with a few more dependencies. It also includes the line unset CPPFLAGS due to a reported issue with the dependency hdf5r:

SeuratDisk Installation
(elgato) [netid@junonia ~]$ interactive -a your_group
[netid@cpu1 ~]$ module load R/<version> gdal geos hdf5/1.10.5 libpng/1.6.37 glpk
[netid@cpu1 ~]$ unset CPPFLAGS
[netid@cpu1 ~]$ R
> install.packages("Seurat")
> install.packages("remotes")
> remotes::install_github("mojaveazure/seurat-disk")
Loading SeuratDisk in RStudio
> dyn.load("/opt/ohpc/pub/apps/glpk/5.0/lib/libglpk.so.40")
> dyn.load("/opt/ohpc/pub/apps/proj/7.2.1/lib/libproj.so.19")
> dyn.load("/opt/ohpc/pub/apps/gdal/3.3.2/lib/libgdal.so.29")
> dyn.load("/opt/ohpc/pub/libs/gnu8/hdf5/1.10.5/lib/libhdf5_hl.so.100")
> library(Seurat)
> library(SeuratDisk)
 Monocle3

Monocle3

To install Monocle3, you'll need to be in an interactive terminal session and not in an RStudio session. This is because it depends on software modules that RStudio doesn't have access to (see Common Problems  OOD RStudio Issues above for more information).

You will also need to make sure Anaconda is completely removed from your environment prior to the install. If you have Anaconda initialized in your account, see the code block under Resolving Anaconda IssuesTemporary Removal above.

When using the dyn.loads in RStudio, you will need to be careful to run them in the order shown below, otherwise you may wind up with "Undefined symbol" errors. If you repeatedly run into library errors working in RStudio, you might consider converting your workflow to a batch script that you can submit through the command line. See the section Example R Scripts above for more information.

Monocle3's documentation includes steps that you can use for a successful installation.

Monocle3 Installation
(elgato) [netid@junonia ~]$ interactive -a your_group
[netid@cpu1 ~]$ module load R/<version> gdal
[netid@cpu1 ~]$ R
> install.packages("BiocManager")
> BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
                       'limma', 'lme4', 'S4Vectors', 'SingleCellExperiment',
                       'SummarizedExperiment', 'batchelor', 'HDF5Array',
                       'terra', 'ggrastr'))
> install.packages("devtools")
> devtools::install_github('cole-trapnell-lab/monocle3')
Loading Monocle3 in RStudio
dyn.load("/opt/ohpc/pub/apps/gdal/3.3.2/lib/libgdal.so.29")
dyn.load("/opt/ohpc/pub/apps/proj/7.2.1/lib/libproj.so.19")
library(monocle3)