RCAC - Knowledge Base: Hammer User Guide: R: All topics

R

R, a GNU project, is a language and environment for data manipulation, statistics, and graphics. It is an open source version of the S programming language. R is quickly becoming the language of choice for data science due to the ease with which it can produce high quality plots and data visualizations. It is a versatile platform with a large, growing community and collection of packages.

For more general information on R visit The R Project for Statistical Computing.

Loading Data into R

R is an environment for manipulating data. In order to manipulate data, it must be brought into the R environment. R has a function to read any file that data is stored in. Some of the most common file types like comma-separated variable(CSV) files have functions that come in the basic R packages. Other less common file types require additional packages to be installed. To read data from a CSV file into the R environment, enter the following command in the R prompt:

> read.csv(file = "path/to/data.csv", header = TRUE)

When R reads the file it creates an object that can then become the target of other functions. By default the read.csv() function will give the object the name of the .csv file. To assign a different name to the object created by read.csv enter the following in the R prompt:

> my_variable <- read.csv(file = "path/to/data.csv", header = FALSE)

To display the properties (structure) of loaded data, enter the following:

> str(my_variable)

For more functions and tutorials:

Running R jobs

This section illustrates how to submit a small R job to a SLURM queue. The example job computes a Pythagorean triple.

Prepare an R input file with an appropriate filename, here named myjob.R:

# FILENAME:  myjob.R

# Compute a Pythagorean triple.
a = 3
b = 4
c = sqrt(a*a + b*b)
c     # display result

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/bash
# FILENAME:  myjob.sub

module load r

# --vanilla:
# --no-save: do not save datasets at the end of an R session
R --vanilla --no-save < myjob.R

submit the job

View job status

View results of the job

For other examples or R jobs:

Installing R packages

Link to section 'Challenges of Managing R Packages in the Cluster Environment' of 'Installing R packages' Challenges of Managing R Packages in the Cluster Environment

Different clusters have different hardware and softwares. So, if you have access to multiple clusters, you must install your R packages separately for each cluster.
Each cluster has multiple versions of R and packages installed with one version of R may not work with another version of R. So, libraries for each R version must be installed in a separate directory.
You can define the directory where your R packages will be installed using the environment variable R_LIBS_USER.
For your convenience, a sample ~/.Rprofile example file is provided that can be downloaded to your cluster account and renamed into ~/.Rprofile (or appended to one) to customize your installation preferences. Detailed instructions.

Link to section 'Installing Packages' of 'Installing R packages' Installing Packages

Step 0: Set up installation preferences.
Follow the steps for setting up your ~/.Rprofile preferences. This step needs to be done only once. If you have created a ~/.Rprofile file previously on Hammer, ignore this step.
Step 1: Check if the package is already installed.
As part of the R installations on community clusters, a lot of R libraries are pre-installed. You can check if your package is already installed by opening an R terminal and entering the command installed.packages(). For example,
```
module load r/4.4.1
R
```
```
installed.packages()["units",c("Package","Version")]
Package Version 
"units" "0.8-1"
quit()
```
If the package you are trying to use is already installed, simply load the library, e.g., library('units'). Otherwise, move to the next step to install the package.
Step 2: Load required dependencies. (if needed)
For simple packages you may not need this step. However, some R packages depend on other libraries. For example, the sf package depends on gdal and geos libraries. So, you will need to load the corresponding modules before installing sf. Read the documentation for the package to identify which modules should be loaded.
```
module load gdal
module load geos
```

Step 3: Install the package.
Now install the desired package using the command install.packages('package_name'). R will automatically download the package and all its dependencies from CRAN and install each one. Your terminal will show the build progress and eventually show whether the package was installed successfully or not.

install.packages('sf', repos="https://cran.case.edu/")
Installing package into ‘/home/myusername/R/x86_64-pc-linux-gnu-library/4.4.1’
(as ‘lib’ is unspecified)
trying URL 'https://cran.case.edu/src/contrib/sf_0.9-7.tar.gz'
Content type 'application/x-gzip' length 4203095 bytes (4.0 MB)
==================================================
downloaded 4.0 MB
...
...
more progress messages
...
...
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (sf)

The downloaded source packages are in
    ‘/tmp/RtmpSVAGio/downloaded_packages’

Step 4: Troubleshooting. (if needed)
If Step 3 ended with an error, you need to investigate why the build failed. Most common reason for build failure is not loading the necessary modules.

Link to section 'Loading Libraries' of 'Installing R packages' Loading Libraries

Once you have packages installed you can load them with the library() function as shown below:

library('packagename')

The package is now installed and loaded and ready to be used in R.

Link to section 'Example: Installing dplyr' of 'Installing R packages' Example: Installing `dplyr`

The following demonstrates installing the dplyr package assuming the above-mentioned custom ~/.Rprofile is in place (note its effect in the "Installing package into" information message):

module load r
R

install.packages('dplyr', repos="http://ftp.ussg.iu.edu/CRAN/")
Installing package into ‘/home/myusername/R/hammer/4.4.1’
(as ‘lib’ is unspecified)
 ...
also installing the dependencies 'crayon', 'utf8', 'bindr', 'cli', 'pillar', 'assertthat', 'bindrcpp', 'glue', 'pkgconfig', 'rlang', 'Rcpp', 'tibble', 'BH', 'plogr'
 ...
 ...
 ...
The downloaded source packages are in 
    '/tmp/RtmpHMzm9z/downloaded_packages'

library(dplyr)

Attaching package: 'dplyr'

For more information about installing R packages:

RStudio

RStudio is a graphical integrated development environment (IDE) for R. RStudio is the most popular environment for developing both R scripts and packages. RStudio is provided on most Research systems.

There are two methods to launch RStudio on the cluster: command-line and application menu icon.

Link to section 'Launch RStudio by the command-line:' of 'RStudio' Launch RStudio by the command-line:

module load gcc
module load r
module load rstudio
rstudio

Note that RStudio is a graphical program and in order to run it you must have a local X11 server running or use Thinlinc Remote Desktop environment. See the ssh X11 forwarding section for more details.

Link to section 'Launch Rstudio by the application menu icon:' of 'RStudio' Launch Rstudio by the application menu icon:

Log into desktop.hammer.rcac.purdue.edu with web browser or ThinLinc client
Click on the Applications drop down menu on the top left corner
Choose Cluster Software and then RStudio

This shows where to find Rstudio under the 'Cluster Software' option in the list of Applications.

R and RStudio are free to download and run on your local machine. For more information about RStudio:

Link to section 'RStudio Server on Hammer' of 'Running RStudio Server on Hammer' RStudio Server on Hammer

A different version of RStudio is also installed on Hammer. RStudio Server allows you to run RStudio through your web browser.

Link to section 'Projects' of 'Running RStudio Server on Hammer' Projects

One benefit of RStudio is that your work can be separated into projects. You can give each project a working directory, workspace, history and source documents. When you are creating a new project, you can start it in a new empty directory, one with code and data already present or by cloning a repository.

RStudio Server allows easy collaboration and sharing of R projects. Just click on the project drop down menu in the top right corner and add the career account user names of those you wish to share with.

Link to section 'Sessions' of 'Running RStudio Server on Hammer' Sessions

Another feature is the ability to run multiple sessions at once. You can do multiple instances of the same project in parallel or work on different projects simultaneously. The sessions dropdown menu is in the upper right corner right above the project menu. Here you can kill or open sessions. Note that closing a window does not end a session, so please kill sessions when you are not using them.

You can view an overview of all your projects and active sessions by clicking on the blue RStudio Server Home logo in the top left corner of the window next to the file menu.

Link to section 'Packages' of 'Running RStudio Server on Hammer' Packages

You can install new packages with the install.packages() function in the console. You can also graphically select any packages you have previously installed on any cluster. Simply select packages from the tabs on the bottom right side of the window and select the package you wish to load.

For more information about RStudio:

Setting Up R Preferences with .Rprofile

For your convenience, a sample ~/.Rprofile example file is provided that can be downloaded to your cluster account and renamed into ~/.Rprofile (or appended to one). Follow these steps to download our recommended ~/.Rprofile example and copy it into place:

curl -#LO https://www.rcac.purdue.edu/files/knowledge/run/examples/apps/r/Rprofile_example
mv -ib Rprofile_example ~/.Rprofile

The above installation step needs to be done only once on Hammer. Now load the R module and run R:

module load r/4.4.1
R

.libPaths()
[1] "/home/myusername/R/hammer/4.1.2-gcc-6.3.0-ymdumss"
[2] "/apps/spack/hammer/apps/r/4.1.2-gcc-6.3.0-ymdumss/rlib/R/library"

.libPaths() should output something similar to above if it is set up correctly.

You are now ready to install R packages into the dedicated directory /home/myusername/R/hammer/4.1.2-gcc-6.3.0-ymdumss.