Using Python & Python Packages
Installation & Package Policy
We maintain a two tiered approach to Python packages
- Tier 1: We install the basic Python packages that are required by most users (these are mostly libraries rather than packages, such as numpy and scipy). This is done for the versions of Python that we install as modules. Adding some packages might force an upgrade of numpy for example, which might break a user's environment that was dependent on the prior version.
- Tier 2: For packages that we do not provide we STRONGLY recommend the use of virtualenv, which is detailed below and provides a custom and easy to use person Python environment.
Available Python Versions
Python 2 is no longer officially supported by the Python Software Foundation.
Six versions of Python are available on HPC. They are only available on compute nodes and are accessible either using a batch submission or interactive session.
Version | Accessibility | Notes |
---|---|---|
Python 2.7.5 | system version (no module) | Accessible as python |
Python 3.6.8 | system version (no module) | Accessible as python3 (unless python module is loaded) |
Python 3.6.5 | module load python/3.6/3.6.5 | Includes many packages |
Python 3.8.2 | module load python/3.8/3.8.2 | Includes more packages |
Python 3.9.10 | module load python/3.9/3.9.10 | |
Python 3.11.4 | module load python/3.11/3.11.4 |
Installing Python Packages Using virtualenv
Useful overview of virtualenv and venv: InfoWorld Article: Python virtualenv and venv do's and don'ts
One of the best things about Python is the number of packages provided by the user community. On a personal machine, the most popular method today for managing these packages is the use of a package manager, like pip. Unfortunately, these require root access and are not a viable solution on the clusters.
There is an easy solution, however. You can use virtualenv to create a personal python environment that will persist each time you log in. There is no risk of packages being updated under you for another user.
To find packages you might want to start with python.org.
In the following instructions any module commands have to be run from an interactive session on a compute node
Virtual Environment Instructions
Set up your virtual environment in your account. This step is done one time only and will be good for all future uses of your Python environment. You will need to be in an interactive session to follow along.
Note: In the commands below, /path/to/virtual/env is the path to the directory where all of your environment's executables and packages will be saved. For example, if you use the path ~/mypyenv, this will create a directory in your home called mypyenv. Inside will be directoriesbin
,lib
,lib64
, andinclude
.
Commands
Python Version < 3.8module load python/<version> virtualenv --system-site-packages /path/to/virtual/env
Python Version ≥ 3.8
module load python/<version> python3 -m venv --system-site-packages /path/to/virtual/env
To use your new environment, you'll need to activate it. Inside your virtual environment, there's a directory called
bin
that has a file calledactivate
. Sourcing this will add all of the paths needed to your working environment. To activate, run the following, replacing /path/to/virtual/env with the path specific to your account:source /path/to/virtual/env/bin/activate
Once your environment is active, you can use pip to install your python packages. You should first upgrade to the latest version of pip. For example, to add the pycurl package to the virtual environment:
pip install --upgrade pip pip install pycurl
If you would like your virtual environment to always be active, you can add the activate command to your ~/.bashrc. This is a hidden file in your home directory that sets up your environment each time you log in. To edit it, open the file using your favorite text editor. Then, add the following to a blank line:
module load python/<version> source /path/to/virtual/env/bin/activate
Using and Installing Python Packages with Conda
Initializing Conda
Users have access to conda
to install packages locally in their account. For a cheat sheet on conda
commands, see: https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html
Example for setting up a local conda environment:
module load anaconda/2020 conda init bash # only needs to be run one time in your account source ~/.bashrc # Makes the init changes live. Only needs to be run after the one-time initialization conda create --name py37 python=3.7 # Build a local environment with a specific version of python conda activate py37 # activate your environment.
Once your environment is activated, you will be able to download and use custom packages with conda
.
It should be noted that the conda init bash
step will modify your ~/.bashrc file so that Anaconda is automatically activated every time you log in. This behavior is known to cause some issues when using HPC resources such as OOD Desktop sessions (For information, see: FAQ -- resolving Anaconda issues).
One way to more effectively control your environment is to turn off conda's auto-activation feature. This can be done by running the command:
conda config --set auto_activate_base false
Not running the above command after initial conda setup can lead to errors with Open On Demand. This step is necessary to prevent such errors. It only needs to be performed once.
This will prevent Anaconda from being loaded into your environment until you manually activate it using:
conda activate
If you have turned off auto-activation, you can still use Anaconda in a batch script using:
source ~/.bashrc && conda activate
Installing Packages with Conda
Once conda has been configured following the steps above, users can often install software they need to run if it comes as a conda package.
It is almost always preferable to install new software in a separate environment so that conda can more easily manage dependencies. Installing multiple packages within a single environment can lead to issues! Only attempt to do this (1) in a new environment other than "base" and (2) if you absolutely need to have those packages installed in the same environment.
If you wish to install and run a new conda package, please follow these steps:
1. Access an interactive session. This can be done quickly by running
elgato interactive -a <your_group>
You will still be able to run any software installed this way on either Puma or Ocelote. Switching to El Gato is preferable for quickly accessing a compute node.
2. Create an environment for your new software. Give the environment a title related to the software you are installing so that you can keep track. This is especially helpful if you plan on having more than one additional conda environment.
conda create -n <my_new_env> conda activate <my_new_env>
You can then check your available environments with
conda env list
3. Follow the software-specific installation instructions. This may be as simple as running "conda install <my_package>", or it may involve installing a handful of dependencies. If the installation instructions ask you to create a new environment, you do not have to repeat this step.
You should then be able to access your software within this environment! If you are unable to load your software, check your active environment with
conda info
and the installed packages with
conda list
4. (optional) Sometimes, installing the package from conda isn't sufficient. After you have the source code installed, you may want to clone the git repo with examples and useful scripts. Naturally, this is very much dependent on the software you are using and the resources the developers provide.
Jupyter Notebooks on OOD
Prior to maintenance on 3/27/2022, OnDemand Jupyter Notebooks used Python 3.6.5. Because Python 3.6 has reached end of life, Jupyter now uses Python 3.8.2.
HPC provides access to Jupyter notebooks on all three clusters through our Open OnDemand interface. For more information on using this service, see our page on Open On Demand.
Custom Kernels
To use locally-installed packages in your Jupyter session, you can create a virtual environment and install your own kernel.
The default version of Python available in Jupyter is 3.8.2. If you would like to create a virtual environment using a standard python module, you will need to use the default version that Jupyter uses. If you want to use your own version of python, you can use an Anaconda environment. Steps for both options are provided below:
Loading Modules in Jupyter
In OnDemand Jupyter sessions, accessing HPC software modules directly from within a notebook can be challenging due to system configurations. However, it's still possible to access these modules when needed. For instance, machine learning packages like TensorFlow or PyTorch often require additional software modules such as CUDA for GPU utilization.
To access software modules in your Jupyter notebooks, follow the steps below:
Step 1: If you haven't already done so, create a custom kernel for your Jupyter notebook environment.
Step 2: You will then need to edit your kernel configuration file kernel.json
which is what sets up your environment at runtime. This file can be found in the following location, where <kernel_name>
is a placeholder for the name you gave your kernel when it was created:
$HOME/.local/share/jupyter/kernels/<kernel_name>/kernel.json
Step 3: Next, you will need to modify your kernel's configuration by editing this file. Start by opening it with a text editor, for example nano $HOME/.local/share/jupyter/kernels/<kernel_name>/kernel.json
. The contents of this file should look something like the following:
{ "argv": [ "</path/to/your/environment>/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}" ], "display_name": "<kernel_name>", "language": "python", "metadata": { "debugger": true } }
The part you need to change is the section under argv
. We will change this from executing a Python command to a Bash command with a module load statement. Make a note of the path </path/to/your/environment>/bin/python
to use in the edited file. The edited file should look like the following:
{ "argv": [ "bash", "-c", "module load <your_modules_here> ; </path/to/your/environment>/bin/python -m ipykernel_launcher -f {connection_file}" ], "display_name": "<kernel_name>", "language": "python", "metadata": { "debugger": true } }
Replace <your_modules_here>
with the modules you would like to load and </path/to/your/environment>/bin/python
with the path to your environment's python.
Step 4: Save the kernel.json
file and restart your Jupyter notebook session.