Using Python & Python Packages

Overview

Different versions of python are available on HPC both as system modules as well as system software on each compute node. Python 2 is available but is no longer supported by the Python Foundation, so we recommend you use Python 3. Python version 3 requires the python3 command or pip3 list to differentiate.  It is very different from Python version 2, so do not assume that Python 3 will work for you or that all older modules will work with version 3.

We recommend you use the virtual environment supported by Python, with more details in the sections below.  One of the biggest reasons is that you can preserve your compute environment.  The packages provided in the system Python are subject to change which may introduce incompatibilities at runtime.

Contents

Installation & Package Policy

We maintain a two tiered approach to Python packages

  • Tier 1: We install the basic Python packages that are required by most users (these are mostly libraries rather than packages, such as numpy and scipy). This is done for the versions of Python that we install as modules.  Adding some packages might force an upgrade of numpy for example, which might break a user's environment that was dependent on the prior version.

  • Tier 2: For packages that we do not provide we STRONGLY recommend the use of virtualenv, which is detailed below and provides a custom and easy to use person Python environment.

Available Python Versions

Python 2 is no longer officially supported by the Python Software Foundation.

Six versions of Python are available on HPC. They are only available on compute nodes and are accessible either using a batch submission or interactive session. 

VersionAccessibilityNotes
Python 2.7.5system version (no module)Accessible as python
Python 3.6.8system version (no module)

Accessible as python3 (unless python module is loaded)

Python 3.6.5module load python/3.6/3.6.5Includes many packages
Python 3.8.2module load python/3.8/3.8.2Includes more packages
Python 3.9.10module load python/3.9/3.9.10
Python 3.11.4module load python/3.11/3.11.4

Installing Python Packages Using virtualenv

One of the best things about Python is the number of packages provided by the user community. On a personal machine, the most popular method today for managing these packages is the use of a package manager, like pip. Unfortunately, these require root access and are not a viable solution on the clusters.

There is an easy solution, however. You can use virtualenv to create a personal python environment that will persist each time you log in. There is no risk of packages being updated under you for another user.

To find packages you might want to start with python.org.

In the following instructions any module commands have to be run from an interactive session on a compute node

Virtual Environment Instructions

  1. Set up your virtual environment in your account. This step is done one time only and will be good for all future uses of your Python environment. You will need to be in an interactive session to follow along. 

    Note: In the commands below, /path/to/virtual/env is the path to the directory where all of your environment's executables and packages will be saved. For example, if you use the path ~/mypyenv, this will create a directory in your home called mypyenv. Inside will be directories bin, lib, lib64, and include

    Commands
    Python Version < 3.8

    module load python/<version>
    virtualenv --system-site-packages /path/to/virtual/env 

    Python Version ≥ 3.8

    module load python/<version>
    python3 -m venv --system-site-packages /path/to/virtual/env
  2. To use your new environment, you'll need to activate it. Inside your virtual environment, there's a directory called bin that has a file called activate. Sourcing this will add all of the paths needed to your working environment. To activate, run the following, replacing /path/to/virtual/env with the path specific to your account:

    source /path/to/virtual/env/bin/activate
  3. Once your environment is active, you can use pip to install your python packages.  You should first upgrade to the latest version of pip. For example, to add the pycurl package to the virtual environment:

    pip install --upgrade pip
    pip install pycurl
  4. If you would like your virtual environment to always be active, you can add the activate command to your ~/.bashrc. This is a hidden file in your home directory that sets up your environment each time you log in. To edit it, open the file using your favorite text editor. Then, add the following to a blank line:

    module load python/<version>
    source /path/to/virtual/env/bin/activate

Using and Installing Python Packages with Conda

Initializing Conda

Users have access to conda to install packages locally in their account. For a cheat sheet on conda commands, see: https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html

Example for setting up a local conda environment:

module load anaconda/2020
conda init bash                     # only needs to be run one time in your account
source ~/.bashrc                    # Makes the init changes live. Only needs to be run after the one-time initialization 
conda create --name py37 python=3.7 # Build a local environment with a specific version of python
conda activate py37                 # activate your environment. 

Once your environment is activated, you will be able to download and use custom packages with conda

It should be noted that the conda init bash step will modify your ~/.bashrc file so that Anaconda is automatically activated every time you log in. This behavior is known to cause some issues when using HPC resources such as OOD Desktop sessions (For information, see: FAQ -- resolving Anaconda issues).

One way to more effectively control your environment is to turn off conda's auto-activation feature.  This can be done by running the command:

conda config --set auto_activate_base false

Not running the above command after initial conda setup can lead to errors with Open On Demand. This step is necessary to prevent such errors. It only needs to be performed once.

This will prevent Anaconda from being loaded into your environment until you manually activate it using:

conda activate

If you have turned off auto-activation, you can still use Anaconda in a batch script using:

source ~/.bashrc && conda activate

Installing Packages with Conda

Once conda has been configured following the steps above, users can often install software they need to run if it comes as a conda package. 

It is almost always preferable to install new software in a separate environment so that conda can more easily manage dependencies. Installing multiple packages within a single environment can lead to issues! Only attempt to do this (1) in a new environment other than "base" and (2) if you absolutely need to have those packages installed in the same environment.

If you wish to install and run a new conda package, please follow these steps:

1. Access an interactive session. This can be done quickly by running 

elgato
interactive -a <your_group>

You will still be able to run any software installed this way on either Puma or Ocelote. Switching to El Gato is preferable for quickly accessing a compute node.

2. Create an environment for your new software. Give the environment a title related to the software you are installing so that you can keep track. This is especially helpful if you plan on having more than one additional conda environment.

conda create -n <my_new_env>
conda activate <my_new_env>

You can then check your available environments with

conda env list

3. Follow the software-specific installation instructions. This may be as simple as running "conda install <my_package>", or it may involve installing a handful of dependencies. If the installation instructions ask you to create a new environment, you do not have to repeat this step. 

You should then be able to access your software within this environment! If you are unable to load your software, check your active environment with

conda info

and the installed packages with

conda list

4. (optional) Sometimes, installing the package from conda isn't sufficient. After you have the source code installed, you may want to clone the git repo with examples and useful scripts. Naturally, this is very much dependent on the software you are using and the resources the developers provide. 

Jupyter Notebooks on OOD

Prior to maintenance on 3/27/2022, OnDemand Jupyter Notebooks used Python 3.6.5. Because Python 3.6 has reached end of life, Jupyter now uses Python 3.8.2.

HPC provides access to Jupyter notebooks on all three clusters through our Open OnDemand interface. For more information on using this service, see our page on Open On Demand.

Custom Kernels

To use locally-installed packages in your Jupyter session, you can create a virtual environment and install your own kernel. 

The default version of Python available in Jupyter is 3.8.2. If you would like to create a virtual environment using a standard python module, you will need to use the default version that Jupyter uses. If you want to use your own version of python, you can use an Anaconda environment. Steps for both options are provided below:

 Using a Python Module

Using a Python Module

In a terminal session, start by logging into an interactive session. Once you're ready to go, load python version 3.8.2. Because Open OnDemand Jupyter notebooks use the Python 3.8 module, you will need to use this version. For more flexibility, you can use Anaconda to change Python version. To create your virtual environment:

module load python/3.8/3.8.2                      
python3 -m venv --system-site-packages ~/py38-env # See the section on virtual environments above for more info
source ~/py38-env/bin/activate                    # activate your environment 

Once your environment is ready to go, pip-install jupyter and create your own custom kernel. The --force-reinstall flag will allow you to install the jupyter package in your local environment and will not affect the system version. This will create a directory in  ~/.local/share/jupyter/kernels/ in your account: 

pip install jupyter --force-reinstall
ipython kernel install --name py38-env --user 
pip install emoji # example python package for demonstration purposes

Now, go to https://ood.hpc.arizona.edu/ and start a Jupyter notebook.  Once the session starts, open it and click the "new" dropdown menu in the upper right. If everything is working correctly, you should see your custom name:

Once you've selected your environment, try loading a custom module to check that everything is working as expected:


 Using Anaconda

Using Anaconda

In a terminal session, start by logging into an interactive session, then load your preferred anaconda module, initialize and activate it in your account, and create your local environment:

module load anaconda/<version>         # replace <version> with the version you'd like to use 
conda init bash                        # only needed if using conda for the first time
source ~/.bashrc && conda activate     # make conda initialization live 
conda create --name py38 python=3.8    # create a local environment with your preferred python version. Name and version are customizable.
conda activate py38                    # activate your new local environment

Next, you'll pip install jupyter and use that to create your own custom kernel. This will create a file saved to ~/.local/share/jupyter/kernels/ in your account:

pip install jupyter
ipython kernel install --name py38 --user
pip install emoji # example python package for demonstration purposes

Next, go to https://ood.hpc.arizona.edu/ and start a Jupyter notebook. Once the session starts, open it and click the "new" dropdown menu in the upper right. If everything is working correctly, you should see your custom name:

Once you've selected your environment, try checking the python version in your notebook using the sys module. Additionally, for demonstration purposes, we'll check that the package we installed can be imported and is working. 


Loading Modules in Jupyter

In OnDemand Jupyter sessions, accessing HPC software modules directly from within a notebook can be challenging due to system configurations. However, it's still possible to access these modules when needed. For instance, machine learning packages like TensorFlow or PyTorch often require additional software modules such as CUDA for GPU utilization.

To access software modules in your Jupyter notebooks, follow the steps below:

Step 1: If you haven't already done so, create a custom kernel for your Jupyter notebook environment.

Step 2: You will then need to edit your kernel configuration file kernel.json which is what sets up your environment at runtime. This file can be found in the following location, where <kernel_name> is a placeholder for the name you gave your kernel when it was created:

$HOME/.local/share/jupyter/kernels/<kernel_name>/kernel.json

Step 3: Next, you will need to modify your kernel's configuration by editing this file. Start by opening it with a text editor, for example nano $HOME/.local/share/jupyter/kernels/<kernel_name>/kernel.json. The contents of this file should look something like the following:

{
 "argv": [
  "</path/to/your/environment>/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "<kernel_name>",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

The part you need to change is the section under argv. We will change this from executing a Python command to a Bash command with a module load statement. Make a note of the path </path/to/your/environment>/bin/python to use in the edited file. The edited file should look like the following:

{
 "argv": [
 "bash",
 "-c",
 "module load <your_modules_here> ; </path/to/your/environment>/bin/python -m ipykernel_launcher -f {connection_file}"
 ],
 "display_name": "<kernel_name>",
 "language": "python",
 "metadata": {
 "debugger": true
 }
}

Replace <your_modules_here> with the modules you would like to load and </path/to/your/environment>/bin/python with the path to your environment's python.

Step 4: Save the kernel.json file and restart your Jupyter notebook session.