Skip to main content

r

Link to section 'Description' of 'r' Description

Linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Please consult the R project homepage for further information.

Link to section 'Versions' of 'r' Versions

  • Bell: 3.6.3, 4.0.0, 4.1.2, 4.2.2
  • Brown: 3.6.1, 3.6.3, 4.0.0, 4.1.2, 4.2.2
  • Scholar: 3.6.1, 3.6.3, 4.0.0, 4.0.5, 4.1.2, 4.2.2
  • Gilbreth: 3.6.1, 3.6.3, 4.0.0, 4.1.2, 4.2.2
  • Negishi: 4.2.2
  • Anvil: 4.0.5, 4.1.0
  • Workbench: 3.6.1, 3.6.3, 4.0.0, 4.1.2, 4.2.2

Link to section 'Module' of 'r' Module

You can load the modules by:

module load r

Link to section 'Setting Up R Preferences with .Rprofile' of 'r' Setting Up R Preferences with .Rprofile

Different clusters have different hardware and softwares. So, if you have access to multiple clusters, you must install your R packages separately for each cluster. Each cluster has multiple versions of R and packages installed with one version of R may not work with another version of R. So, libraries for each R version must be installed in a separate directory. You can define the directory where your R packages will be installed using the environment variable R_LIBS_USER.

For your convenience, a sample .Rprofile example file is provided that can be downloaded to your cluster account and renamed into /.Rprofile (or appended to one) to customize your installation preferences. Detailed instructions:

curl -#LO https://www.rcac.purdue.edu/files/knowledge/run/examples/apps/r/Rprofile_example
mv -ib Rprofile_example ~/.Rprofile

The above installation step needs to be done only once on each of the clusters you have access to. Now load the R module and run R to confirm the unique libPaths:

module load r/4.2.2
R
R> .libPaths()                  
[1] "/home/zhan4429/R/bell/4.2.2-gcc-9.3.0-xxbnk6s"                 
[2] "/apps/spack/bell/apps/r/4.2.2-gcc-9.3.0-xxbnk6s/rlib/R/library"

Link to section 'Challenging packages' of 'r' Challenging packages

Below are packages users may have difficulty in installation.

Link to section 'nloptr' of 'r' nloptr

In Bell, the installation may fail due to the default `cmake` version is too old. The solution is easy, users just need to load the newer versions of cmake:

module load cmake/3.20.6
module load r
Rscript -e 'install.packages("nloptr")'

In Brown or other older clusters, because our system's cmake and gcc compilers are old, we may not be able to install the latest version of nloptr. The walkaround is that users can install the older versions of nloptr:

module load r
R
 > myrepos = c("https://cran.case.edu")
 > install.packages("devtools", repos = myrepos)
 > library(devtools)
 > install_version("nloptr", version = "> 1.2.2, < 2.0.0", repos = myrepos)

Link to section 'Error: C++17 standard requested but CXX17 is not defined' of 'r' Error: C++17 standard requested but CXX17 is not defined

When users want to install some packages, such as colourvalues, the installation may fail due to Error: C++17 standard requested but CXX17 is not defined. Please follow the below command to fix it:

module load r
module spider gcc
module load gcc/xxx  ## the lateste gcc is recommended
mkdir -p ~/.R
echo 'CXX17 = g++ -std=gnu++17 -fPIC' > ~/.R/Makevars
R
> install.packages("xxxx")

Link to section 'RCurl' of 'r' RCurl

Some R packages rely on curl. When you install these packages such as RCurl, you may see such error: checking for curl-config... no Cannot find curl-config To install such packages, you need to load the curl module:
module load curl
module load r
R
> install.packages("RCurl")

Link to section 'raster, stars and sf' of 'r' raster, stars and sf

These R packages have some dependencies. To install them, users will need to load several modules. Note that these modules have multiple versions, and the latest version is recommended. However, the default version may not be the latest version. To check the latest version, please run module spider XX.
module spider gdal
module spider geos
module spider proj
module spider sqlite

module load gdal/XXX geos/XXX proj/XXX sqlite/XXX  ## XXX is the version to use. The latest version is recommended.  
module load r/XXX
R
> install.packages("raster")
     install.packages("stars")
     install.packages("sf")

Running R jobs

This section illustrates how to submit a small R job to a SLURM queue. The example job computes a Pythagorean triple.

Prepare an R input file with an appropriate filename, here named myjob.R:

# FILENAME:  myjob.R

# Compute a Pythagorean triple.
a = 3
b = 4
c = sqrt(a*a + b*b)
c     # display result

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/bash
# FILENAME:  myjob.sub

module load r

# --vanilla:
# --no-save: do not save datasets at the end of an R session
R --vanilla --no-save < myjob.R

For other examples or R jobs:

Installing R packages

Link to section 'Challenges of Managing R Packages in the Cluster Environment' of 'Installing R packages' Challenges of Managing R Packages in the Cluster Environment

  • Different clusters have different hardware and softwares. So, if you have access to multiple clusters, you must install your R packages separately for each cluster.
  • Each cluster has multiple versions of R and packages installed with one version of R may not work with another version of R. So, libraries for each R version must be installed in a separate directory.
  • You can define the directory where your R packages will be installed using the environment variable R_LIBS_USER.
  • For your convenience, a sample ~/.Rprofile example file is provided that can be downloaded to your cluster account and renamed into ~/.Rprofile (or appended to one) to customize your installation preferences. Detailed instructions.

Link to section 'Installing Packages' of 'Installing R packages' Installing Packages

  • Step 0: Set up installation preferences.
    Follow the steps for setting up your ~/.Rprofile preferences. This step needs to be done only once. If you have created a ~/.Rprofile file previously on a resource, ignore this step.

  • Step 1: Check if the package is already installed.
    As part of the R installations on community clusters, a lot of R libraries are pre-installed. You can check if your package is already installed by opening an R terminal and entering the command installed.packages(). For example,

    module load r/4.1.2
    R
    installed.packages()["units",c("Package","Version")]
    Package Version 
    "units" "0.6-3"
    quit()

    If the package you are trying to use is already installed, simply load the library, e.g., library('units'). Otherwise, move to the next step to install the package.

  • Step 2: Load required dependencies. (if needed)
    For simple packages you may not need this step. However, some R packages depend on other libraries. For example, the sf package depends on gdal and geos libraries. So, you will need to load the corresponding modules before installing sf. Read the documentation for the package to identify which modules should be loaded.

    module load gdal
    module load geos
  • Step 3: Install the package.
    Now install the desired package using the command install.packages('package_name'). R will automatically download the package and all its dependencies from CRAN and install each one. Your terminal will show the build progress and eventually show whether the package was installed successfully or not.

    R
    install.packages('sf', repos="https://cran.case.edu/")
    Installing package into ‘/home/myusername/R/the-resource/4.0.0’
    (as ‘lib’ is unspecified)
    trying URL 'https://cran.case.edu/src/contrib/sf_0.9-7.tar.gz'
    Content type 'application/x-gzip' length 4203095 bytes (4.0 MB)
    ==================================================
    downloaded 4.0 MB
    ...
    ...
    more progress messages
    ...
    ...
    ** testing if installed package can be loaded from final location
    ** testing if installed package keeps a record of temporary installation path
    * DONE (sf)
    
    The downloaded source packages are in
        ‘/tmp/RtmpSVAGio/downloaded_packages’
  • Step 4: Troubleshooting. (if needed)
    If Step 3 ended with an error, you need to investigate why the build failed. Most common reason for build failure is not loading the necessary modules.

Link to section 'Loading Libraries' of 'Installing R packages' Loading Libraries

Once you have packages installed you can load them with the library() function as shown below:

library('packagename')

The package is now installed and loaded and ready to be used in R.

Link to section 'Example: Installing dplyr' of 'Installing R packages' Example: Installing dplyr

The following demonstrates installing the dplyr package assuming the above-mentioned custom ~/.Rprofile is in place (note its effect in the "Installing package into" information message):

module load r
R
install.packages('dplyr', repos="http://ftp.ussg.iu.edu/CRAN/")
Installing package into ‘/home/myusername/R/the-resource/4.0.0’
(as ‘lib’ is unspecified)
 ...
also installing the dependencies 'crayon', 'utf8', 'bindr', 'cli', 'pillar', 'assertthat', 'bindrcpp', 'glue', 'pkgconfig', 'rlang', 'Rcpp', 'tibble', 'BH', 'plogr'
 ...
 ...
 ...
The downloaded source packages are in 
    '/tmp/RtmpHMzm9z/downloaded_packages'

library(dplyr)

Attaching package: 'dplyr'

For more information about installing R packages:

Loading Data into R

R is an environment for manipulating data. In order to manipulate data, it must be brought into the R environment. R has a function to read any file that data is stored in. Some of the most common file types like comma-separated variable(CSV) files have functions that come in the basic R packages. Other less common file types require additional packages to be installed. To read data from a CSV file into the R environment, enter the following command in the R prompt:

> read.csv(file = "path/to/data.csv", header = TRUE)

When R reads the file it creates an object that can then become the target of other functions. By default the read.csv() function will give the object the name of the .csv file. To assign a different name to the object created by read.csv enter the following in the R prompt:

> my_variable <- read.csv(file = "path/to/data.csv", header = FALSE)

To display the properties (structure) of loaded data, enter the following:

> str(my_variable)

For more functions and tutorials:

Setting Up R Preferences with .Rprofile

For your convenience, a sample ~/.Rprofile example file is provided that can be downloaded to your cluster account and renamed into ~/.Rprofile (or appended to one). Follow these steps to download our recommended ~/.Rprofile example and copy it into place:

curl -#LO https://www.rcac.purdue.edu/files/knowledge/run/examples/apps/r/Rprofile_example
mv -ib Rprofile_example ~/.Rprofile

The above installation step needs to be done only once on ${resource.name}. Now load the R module and run R:

module load r/4.1.2
R
.libPaths()
[1] "/home/myusername/R/the-resource/4.1.2-gcc-6.3.0-ymdumss"
[2] "/apps/spack/the-resource/apps/r/4.1.2-gcc-6.3.0-ymdumss/rlib/R/library"

.libPaths() should output something similar to above if it is set up correctly.

You are now ready to install R packages into the dedicated directory /home/myusername/R/the-resource/4.1.2-gcc-6.3.0-ymdumss.

Helpful?

Thanks for letting us know.

Please don't include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.