Remote Desktop Launch
Jupyter Hub Launch
Rstudio Launch

Overview of Data Workbench

The Data Workbench is an interactive compute environment for non-batch big data analysis and simulation, and is a part of Purdue's Community Cluster Program. The Data Workbench consists of Dell compute nodes with 24-core AMD EPYC 7401P processors (24 cores per node), and 512 GB of memory. All nodes are interconnected with 10 Gigabit Ethernet. The Data Workbench entered production on October 1, 2017.

To purchase access to Data Workbench today, go to the Cluster Access Purchase page. Please subscribe to our Community Cluster Program Mailing List to stay informed on the latest purchasing developments or contact us via email at rcac-cluster-purchase@lists.purdue.edu if you have any questions.

Data Workbench Specifications

All Data Workbench nodes have 24 processor cores, 512 GB of RAM, and 10 Gbps Ethernet.

Data Workbench Front-Ends
Front-Ends Number of Nodes Processors per Node Cores per Node Memory per Node Retires in
6 One AMD Epyc 7401P CPU @ 2.0GHz 24 512 GB 2024

Data Workbench nodes run CentOS 7 and are intended for interactive work via the Thinlinc remote desktop software, Jupyterhub, or Rstudio Server. Data Workbench provides no batch system.

The application of operating system patches occurs as security needs dictate. All nodes allow for unlimited stack usage, as well as unlimited core dump size (though disk space and server quotas may still be a limiting factor). All nodes guarantee even access to CPU and memory resources via Linux cgroups.

On Data Workbench, ITaP recommends the following set of compiler and math libraries:

  • Intel 17.0.1.132
  • MKL

This compiler and these libraries are loaded by default. To load the recommended set again:

$ module load rcac

To verify what you loaded:

$ module list

Data Workbench Regular Maintenance

Regular planned maintenance on Data Workbench is scheduled for the first Thursday of every month, 8:00am to 5:00pm.