Chapter 6 RStudio in Alliance Canada

Note 1: This refers to an experimental usage of RStudio for HPC. This approach has no official support from the Alliance, and the responsability is solely to the user. An official R environment is provided, with support, by the Alliance at this website.

Note 2: This approach is aimed to single node processes. If you want to explore (or need) distributed processing, we recommend to use the supported version, and speak with user groups and the Alliance support to help designing routines.

6.1 Introduction

In this approach, we use a prestablished RockeR, docker container from which we expand to our user needs. Binary versions of the container are available at Docker Hub, with the releases and their code available on GitHub.

6.2 Running an Apptainer HPC Container as an interactive session

Prior to be able to run RStudio, you need to have a suitable container already built as a Singularity container, or build by yourself.

From the cluster login node you can build it with the following steps (it is assumed that you are using bash as a regular user - no sudo needed):

1. Load Apptainer:

module load apptainer

Pick up the most recent version of it (please note that Docker Hub may have different tags, with versioning for each), and accept.

2. Then run the build (run this command at your home, project or scratch folder):

apptainer build rstudio_container_v1_0.sif docker://ricardobarroslourenco/rstudio_container:v1_0

You should have, at the end, a file named rstudio_container_v1_0.sif in the directory you ran the command. This is your Apptainer container.

3. Create transient folders and files to be able to run your container (these should be hosted at the same location as your container). Creating directories:

mkdir -p run var-lib-rstudio-server .rstudio

Then the sqlite configuration file:

printf 'provider=sqlite\ndirectory=/var/lib/rstudio-server\n' > database.conf

4. Starting an Interactive Job on the Supercomputer

salloc --time=0:30:0 --account=rrg-ggalex --cpus-per-task=8 --mem=0 --mail-user=macid@mcmaster.ca --mail-type=ALL

The setup of the flags is defined as this (more details available at the Alliance Documentation, as well as on the Slurm manual):

  • time: time of the interactive session in HH:MM:SS;
  • account: PI account at the Alliance Can project;
  • cpus-per-task: Amount of cpus in this interactive session. Capped to the limit of the node;
  • mem: amount of RAM you want to allocate (zero means all memory of the node);
  • mail-user: e-mail address to send updates on the job allocation;
  • mail-type: verbosity of the e-mail.

After running this, you probably will see your bash change to something like (in the Graham cluster):

(base) [lourenco@gra-login1 scratch]$ salloc --time=0:30:0 --account=rrg-ggalex --cpus-per-task=8 --mem=0 --mail-user=barroslr@mcmaster.ca --mail-type=ALL
salloc: Pending job allocation 6806097
salloc: job 6806097 queued and waiting for resources
salloc: job 6806097 has been allocated resources
salloc: Granted job allocation 6806097
salloc: Waiting for resource configuration
salloc: Nodes gra189 are ready for job
(base) [lourenco@gra189 scratch]$

Please note that gra189 in this case is your machine name. You need to remember it later to use it in your ssh tunnel.

5. Start the container in the newly allocated working node

apptainer exec --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ rstudio_container_v1_0.sif /usr/lib/rstudio-server/bin/rserver --auth-none=1 --auth-pam-helper-path=pam-helper --server-user=$(whoami) --www-port=8787

This will launch an rstudio server as a singularity service. If you need, bash is available as a terminal tab in that RStudio. To access the instance, you need to do a ssh tunnel, to forward the cluster port in the working node, to your machine. For that, you must to create a new, separate, ssh connection, and keep it open (without running any job on it). You can do as this (similar to what we described in the ssh section of CCPrimer, with a twist):

ssh -i ~/computecanada/computecanada-key -L 9102:graXXX:8787 username@graham.computecanada.ca

Some details:

  • -i ~/computecanada/computecanada-key: the -i flag calls your ssh key file;
  • -L 9102:graXXX:8787: the ssh tunnel in fact. It forwards node port 8787 to your localhost at port 9102. Note that you must replace graXXX with the node name of the machine assigned to you after running salloc;
  • : your username, followed by the cluster url. Note that our lab research allocation is available in Graham for now.

After this, you should be able to access the rstudio server by using this url in your browser:

localhost:9102

6.3 Finishing your Running session

Do not forget saving your work in partitions that enable saving (ex.: /home, /project, /scratch), because if you save in a folder within the container, it will not be saved, after your session ends. Remember that /scratch is meant for temporary save, being automatically wiped by the cluster systems after 30 days.

Once saved, you can simply end your running sessions by either using Ctrl + C in the apptainer run session, and then calling exit on bash up to ending the salloc session, and another time, to relinquish the ssh session (remember to do this both in the bash session that runs the jobs, as well in the ssh tunnel). At the end you will only see the bash of your local machine. It is important to do this procedure, because otherwise it may be possible that the machine is still running, and consument the group Alliance Canada credits. Once you finish the slurm salloc session you will receive an e-mail (as you got one when the session started), telling that the session has finished (or relinquished, if your runnign time expired).