R, a GNU project, is a language and environment for data manipulation, statistics, and graphics. It is an open source version of the S programming language. R is quickly becoming the language of choice for data science due to the ease with which it can produce high quality plots and data visualizations. It is a versatile platform with a large, growing community and collection of packages.
For more general information on R visit The R Project for Statistical Computing.
Loading Data into R
R is an environment for manipulating data. In order to manipulate data, it must be brought into the R environment. R has a function to read any file that data is stored in. Some of the most common file types like comma-separated variable(CSV) files have functions that come in the basic R packages. Other less common file types require additional packages to be installed. To read data from a CSV file into the R environment, enter the following command in the R prompt:
> read.csv(file = "path/to/data.csv", header = TRUE)
When R reads the file it creates an object that can then become the target of other functions. By default the read.csv() function will give the object the name of the .csv file. To assign a different name to the object created by read.csv enter the following in the R prompt:
> my_variable <- read.csv(file = "path/to/data.csv", header = FALSE)
To display the properties (structure) of loaded data, enter the following:
For more functions and tutorials:
Running R jobs
This section illustrates how to submit a small R job to a SLURM queue. The example job computes a Pythagorean triple.
Prepare an R input file with an appropriate filename, here named myjob.R:
# FILENAME: myjob.R # Compute a Pythagorean triple. a = 3 b = 4 c = sqrt(a*a + b*b) c # display result
Prepare a job submission file with an appropriate filename, here named myjob.sub:
#!/bin/bash # FILENAME: myjob.sub module load r # --vanilla: # --no-save: do not save datasets at the end of an R session R --vanilla --no-save < myjob.R
For other examples or R jobs:
Installing R packages
Link to section 'Challenges of Managing R Packages in the Cluster Environment' of 'Installing R packages' Challenges of Managing R Packages in the Cluster Environment
- Different clusters have different hardware and softwares. So, if you have access to multiple clusters, you must install your R packages separately for each cluster.
- Each cluster has multiple versions of R and packages installed with one version of R may not work with another version of R. So, libraries for each R version must be installed in a separate directory.
- You can define the directory where your R packages will be installed using the environment variable
- For your convenience, ITaP provides a sample ~/.Rprofile example file that can be downloaded to your cluster account and renamed into
~/.Rprofile(or appended to one) to customize your installation preferences. Detailed instructions.
Link to section 'Installing Packages' of 'Installing R packages' Installing Packages
Step 0: Set up installation preferences.
Follow the steps for setting up your
~/.Rprofilepreferences. This step needs to be done only once. If you have created a
~/.Rprofilefile previously on Hammer, ignore this step.
Step 1: Check if the package is already installed.
As part of the R installations on ITaP community clusters, a lot of R libraries are pre-installed. You can check if your package is alreday installed by opening an R terminal and entering the command
installed.packages(). For example,
module load r/4.0.0 R
installed.packages()["units",c("Package","Version")] Package Version "units" "0.6-3" quit()
If the package you are trying to use is already installed, simply load the library, e.g.,
library('units'). Otherwise, move to the next step to install the package.
Step 2: Load required dependencies. (if needed)
For simple packages you may not need this step. However, some R packages depend on other libraries. For example, the
sfpackage depends on
geoslibraries. So, you will need to load the corresponding modules before installing
sf. Read the documentation for the package to identify which modules should be loaded.
module load gdal module load geos
Step 3: Install the package.
Now install the desired package using the command
install.packages('package_name'). R will automatically download the package and all its dependencies from CRAN and install each one. Your terminal will show the build progress and eventually show whether the package was installed successfully or not.
install.packages('sf', repos="https://cran.case.edu/") Installing package into ‘/home/myusername/R/hammer/4.0.0’ (as ‘lib’ is unspecified) trying URL 'https://cran.case.edu/src/contrib/sf_0.9-7.tar.gz' Content type 'application/x-gzip' length 4203095 bytes (4.0 MB) ================================================== downloaded 4.0 MB ... ... more progress messages ... ... ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (sf) The downloaded source packages are in ‘/tmp/RtmpSVAGio/downloaded_packages’
- Step 4: Troubleshooting. (if needed)
If Step 3 ended with an error, you need to investigate why the build failed. Most common reason for build failure is not loading the necessary modules.
Link to section 'Loading Libraries' of 'Installing R packages' Loading Libraries
Once you have packages installed you can load them with the
library() function as shown below:
The package is now installed and loaded and ready to be used in R.
Link to section 'Example: Installing
dplyr' of 'Installing R packages' Example: Installing
The following demonstrates installing the
dplyr package assuming the above-mentioned custom
~/.Rprofile is in place (note its effect in the "Installing package into" information message):
module load r R
install.packages('dplyr', repos="http://ftp.ussg.iu.edu/CRAN/") Installing package into ‘/home/myusername/R/hammer/4.0.0’ (as ‘lib’ is unspecified) ... also installing the dependencies 'crayon', 'utf8', 'bindr', 'cli', 'pillar', 'assertthat', 'bindrcpp', 'glue', 'pkgconfig', 'rlang', 'Rcpp', 'tibble', 'BH', 'plogr' ... ... ... The downloaded source packages are in '/tmp/RtmpHMzm9z/downloaded_packages' library(dplyr) Attaching package: 'dplyr'
For more information about installing R packages:
RStudio is a graphical integrated development environment (IDE) for R. RStudio is the most popular environment for developing both R scripts and packages. RStudio is provided on most Research systems.
There are two methods to launch RStudio on the cluster: command-line and application menu icon.
Link to section 'Launch RStudio by the command-line:' of 'RStudio' Launch RStudio by the command-line:
module load gcc module load r module load rstudio rstudio
Note that RStudio is a graphical program and in order to run it you must have a local X11 server running or use Thinlinc Remote Desktop environment. See the ssh X11 forwarding section for more details.
Link to section 'Launch Rstudio by the application menu icon:' of 'RStudio' Launch Rstudio by the application menu icon:
- Log into desktop.hammer.rcac.purdue.edu with web browser or ThinLinc client
- Click on the
Applicationsdrop down menu on the top left corner
Cluster Softwareand then
R and RStudio are free to download and run on your local machine. For more information about RStudio:
Link to section 'RStudio Server on Hammer' of 'Running RStudio Server on Hammer' RStudio Server on Hammer
A different version of RStudio is also installed on Hammer. RStudio Server allows you to run RStudio through your web browser.
One benefit of RStudio is that your work can be separated into projects. You can give each project a working directory, workspace, history and source documents. When you are creating a new project, you can start it in a new empty directory, one with code and data already present or by cloning a repository.
RStudio Server allows easy collaboration and sharing of R projects. Just click on the project drop down menu in the top right corner and add the career account user names of those you wish to share with.
Another feature is the ability to run multiple sessions at once. You can do multiple instances of the same project in parallel or work on different projects simultaneously. The sessions dropdown menu is in the upper right corner right above the project menu. Here you can kill or open sessions. Note that closing a window does not end a session, so please kill sessions when you are not using them.
You can view an overview of all your projects and active sessions by clicking on the blue RStudio Server Home logo in the top left corner of the window next to the file menu.
You can install new packages with the install.packages() function in the console. You can also graphically select any packages you have previously installed on any cluster. Simply select packages from the tabs on the bottom right side of the window and select the package you wish to load.
For more information about RStudio:
Setting Up R Preferences with .Rprofile
For your convenience, ITaP provides a sample ~/.Rprofile example file that can be downloaded to your cluster account and renamed into
~/.Rprofile (or appended to one). Follow these steps to download our recommended
~/.Rprofile example and copy it into place:
curl -#LO https://www.rcac.purdue.edu/files/knowledge/run/examples/apps/r/Rprofile_example mv -ib Rprofile_example ~/.Rprofile
The above installation step needs to be done only once on Hammer. Now load the R module and run R:
module load r/4.0.0 R
.libPaths()  "/home/myusername/R/hammer/4.0.0"  "/apps/spack/hammer/apps/r/4.0.0-gcc-6.3.0-righufz/rlib/R/library"
.libPaths() should output something similar to above if it is set up correctly.
You are now ready to install R packages into the directory