rclone


New HPC Documentation Website!

New documentation is coming that will replace our current Confluence website (the one you're viewing right now). We will be sending an announcement on when the site will go live. Interested in taking a peek? Check out this page for the beta version. Note: the URL is likely to change.


New GPUs on Ocelote!

We have recently added 22 new P100 GPUs to Ocelote. Need to request multiple GPUs on a node and you're finding Puma queue times too slow? You can now request two GPUs per node on Ocelote using --gres=gpu:2.

Overview

Rclone is an open-source command-line program designed for managing and transferring data between various storage providers. It is a versatile tool that works with a wide range of cloud storage systems, FTP and SFTP servers, and local filesystems.

We have rclone installed on our filexfer (hostname: filexfer.hpc.arizona.edu) and compute nodes for use.

Contents

Configuration

To start transferring data between HPC and a cloud service, you'll first need to establish a connection to the desired service using rclone's configuration process. The easiest way to do this is in an Open OnDemand Virtual Desktop. For the quickest access, use the ElGato cluster. In your desktop, open a terminal session under MATE Terminal in the toolbar.


In the terminal, type rclone config. Select n for "New remote". You will be prompted to enter a name which will be used to access your configured storage service for all future connections. For ease of use, we recommend not including spaces in the name.

rclone config
[sarawillis@gpu67 ~]$ rclone config
No remotes found, make a new one?
n) New remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n
name> 

The rest of the configuration process is dependent on the service you wish to use. Two examples are shown below for connecting to Box and Google Drive. However, there are many more options available to choose from (for example: AWS S3, OneDrive, Dropbox, etc.). The rclone prompts for the various setups below have been edited for brevity.



 Box
Box Configuration
name> Box
Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
# Many storage options listed
Storage> 8
client_id>  
client_secret> 
box_config_file> 
access_token> 
Option box_sub_type.
Choose a number from below, or type in your own string value.
Press Enter for the default (user).
 1 / Rclone should act on behalf of a user.
   \ (user)
 2 / Rclone should act on behalf of a service account.
   \ (enterprise)
box_sub_type> 1
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine

y) Yes (default)
n) No
y/n> y

The auto config process will open a Firefox browser in the desktop session prompting you to log into your Box account. For example, to log into a UArizona Box account, select Use Single Sign On (SSO) (or enter your email address/password for a private Box account):

Next, enter your university email address. This will guide you through the standard University login procedure. Once you're logged in, click Grant access to Box

This will bring you back to your terminal which should display a summary of your connection information. Enter y to confirm.

# <Configuration information here>
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
Box                  box

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
[sarawillis@gpu67 ~]$

To test your connection, try typing rclone lsf <Name>: replacing "<Name>" with the name you gave your remote. For example:

rclone test connection
[sarawillis@gpu67 ~]$ rclone lsf Box:/
RcloneExample/
Research Technologies/
 Google Drive
Google Drive Configuration
name> GoogleDrive
Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
# Many storage options listed
Storage> 17
client_id> 
client_secret> 
Option scope.
Scope that rclone should use when requesting access from drive.
Choose a number from below, or type in your own value.
Press Enter to leave empty.
 1 / Full access all files, excluding Application Data Folder.
   \ (drive)
 2 / Read-only access to file metadata and file contents.
   \ (drive.readonly)
   / Access to files created by rclone only.
 3 | These are visible in the drive website.
   | File authorization is revoked when the user deauthorizes the app.
   \ (drive.file)
   / Allows read and write access to the Application Data folder.
 4 | This is not visible in the drive website.
   \ (drive.appfolder)
   / Allows read-only access to file metadata but
 5 | does not allow any access to read or download file content.
   \ (drive.metadata.readonly)
scope> 1 # <- This option is up to you 
Option root_folder_id.
ID of the root folder.
Leave blank normally.
Fill in to access "Computers" folders (see docs), or for rclone to use
a non root folder as its starting point.
Enter a value. Press Enter to leave empty.
root_folder_id> 
service_account_file> 
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine

y) Yes (default)
n) No
y/n> y

The auto config process will open a Firefox browser in the desktop session prompting you to log into your Google Drive account. Once you're logged in, rclone will ask you for permission to access your storage

This will redirect you back to your terminal where you can finish the configuration process

Finish Google Drive configuration
Configure this as a Shared Drive (Team Drive)?

y) Yes
n) No (default)
y/n> n
# <a list of endpoint configuration information appears here> 
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
GoogleDrive          drive

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
[sarawillis@gpu67 ~]$ 

To test your connection, try typing rclone lsf <Name>: replacing "<Name>" with the name you gave your remote. For example:

Google Drive test
[sarawillis@gpu67 ~]$ rclone lsf GoogleDrive:
Colab Notebooks/
Consulting and Support Services/
HPC/
rclone_example

Transferring Files

See rclone's official documentation for a detailed list of all options available: https://rclone.org/docs/

Once you have a connection configured, you can use rclone in batch jobs, interactive terminal sessions, or on a filexfer node. Rclone supports file transfers and syncs between your local filesystem and a remote endpoint, or between two remote endpoints. When getting started with rclone, we recommend running some tests using the additional flag --dry-run, especially when syncing directories. This will print out the commands rclone would use without executing them.

Listing Contents

To view the contents of a remote service, for the most human-readable format you can use rclone lsf <Name>:/path/to/directory

rclone lsf
[sarawillis@cpu7 rclone_hpc]$ rclone lsf GoogleDrive:
Colab Notebooks/
Consulting and Support Services/
HPC/
rclone_example
[sarawillis@cpu7 rclone_hpc]$ rclone lsf GoogleDrive:/rclone_example
hello.py

Transferring a Single File

To transfer a single file, use the rclone copyto <source> <destination> command. For example, transferring a file from Google Drive to HPC

rclone copyto (Google Drive to HPC)
[sarawillis@cpu7 rclone_hpc]$ ls
[sarawillis@cpu7 rclone_hpc]$ rclone copyto GoogleDrive:/rclone_example/hello.py ./hello.py
[sarawillis@cpu7 rclone_hpc]$ ls
hello.py

And transferring a file from HPC to Google Drive:

rclone copyto (HPC to Google Drive)
[sarawillis@cpu7 rclone_hpc]$ echo "This is a test" > test.txt
[sarawillis@cpu7 rclone_hpc]$ rclone copyto ./test.txt GoogleDrive:/rclone_example/test.txt
[sarawillis@cpu7 rclone_hpc]$ rclone lsf GoogleDrive:/rclone_example
hello.py
test.txt

Transferring a Directory

To transfer a directory, use the rclone copy <source> <destination> command. For example, transferring a file from Google Drive to HPC:

rclone copy (Google Drive to HPC)
[sarawillis@cpu7 rclone_hpc]$ rclone lsf GoogleDrive:/rclone_example
hello.py
rclone_subdirectory/
test.txt
[sarawillis@cpu7 rclone_hpc]$ ls
hello.py  test.txt
[sarawillis@cpu7 rclone_hpc]$ rclone copy GoogleDrive:/rclone_example/rclone_subdirectory ./rclone_subdirectory
[sarawillis@cpu7 rclone_hpc]$ ls
rclone_subdirectory  hello.py  test.txt
[sarawillis@cpu7 rclone_hpc]$ ls rclone_subdirectory/
meep.png  snarf.png  hello_world.txt

And transferring a directory from HPC to Google Drive:

rclone copy (HPC to Google Drive)
[sarawillis@cpu7 rclone_hpc]$ mkdir hpc_directory
[sarawillis@cpu7 rclone_hpc]$ echo "foo" > hpc_directory/bar.txt
[sarawillis@cpu7 rclone_hpc]$ rclone copy ./hpc_directory/ GoogleDrive:/rclone_example/hpc_directory
[sarawillis@cpu7 rclone_hpc]$ rclone lsf GoogleDrive:/rclone_example
hello.py
hpc_directory/
rclone_subdirectory/
test.txt

Syncing a Directory

Using rclone sync <source> <destination> will make <source> and <destination> identical by modifying <destination> only. When starting out with rclone sync, try first using the --dry-run flag to avoid any unintentional data loss. As an example:

rclone sync
[sarawillis@cpu7 rclone_hpc]$ tree
.
├── hello.py
├── hpc_directory
│   └── bar.txt
├── rclone_subdirectory
│   ├── hello_world.txt
│   ├── meep.png
│   └── snarf.png
└── test.txt

2 directories, 6 files
[sarawillis@cpu7 rclone_hpc]$ rclone tree GoogleDrive:/rclone_example
/
├── hello.py
├── hpc_directory
│   └── bar.txt
├── rclone_subdirectory
│   ├── hello_world.txt
│   ├── meep.png
│   └── snarf.png
└── test.txt

2 directories, 6 files
[sarawillis@cpu7 rclone_hpc]$ echo "changing a file" > hpc_directory/bar.txt 
[sarawillis@cpu7 rclone_hpc]$ rm hello.py 
[sarawillis@cpu7 rclone_hpc]$ rclone sync ./ GoogleDrive:/rclone_example --dry-run
2023/10/06 14:26:51 NOTICE: hpc_directory/bar.txt: Skipped copy as --dry-run is set (size 16)
2023/10/06 14:26:51 NOTICE: hello.py: Skipped delete as --dry-run is set (size 52)
2023/10/06 14:26:51 NOTICE: 
Transferred:   	         16 B / 16 B, 100%, 0 B/s, ETA -
Checks:                 6 / 6, 100%
Deleted:                1 (files), 0 (dirs)
Transferred:            1 / 1, 100%
Elapsed time:         1.0s
[sarawillis@cpu7 rclone_hpc]$ rclone sync ./ GoogleDrive:/rclone_example --progress # Now to actually execute the sync
Transferred:   	         16 B / 16 B, 100%, 7 B/s, ETA 0s
Checks:                 6 / 6, 100%
Deleted:                1 (files), 0 (dirs)
Transferred:            1 / 1, 100%
Elapsed time:         3.6