Skip to main content

Data Science

RCAC offers physical infrastructure, expert staff, and education in support of the data science work of researchers, students, and the larger Integrative Data Science Initiative.

Data Science is notoriously difficult to define. In an academic context, it is the common thread of data and computational facility shared among 21st century scientists in every domain. More generally, and as it lives in an industrial context, data science is the intersection of applied computational math and statistics, basic software engineering, and business acumen.

Here are some of the major resources provided in support of data science at Purdue.

Infrastructure

RCAC provides access to leading-edge computational and data storage systems, as well as research software and platforms.

In addition to more traditional HPC systems, we offer an interactive computing platform for both education and research. These platforms are connected to Jupyter Hub for interactive computing with Jupyter notebooks, Rstudio for data analysis in R, and a remote desktop interface for everything else—all connected to powerful compute resources.

We also have GPU-enabled systems designed to support machine learning or other GPU-accelerated applications.

Keep your research data in Data Depot and have direct access from any of our compute resources.

A major component of data science work is keeping your code under version control (especially when working collaboratively in a group). Purdue hosts its own instance of Github Enterprise through RCAC. This offers all the features of Github.com but for private research use, integrated with Purdue Career Accounts and at no cost to research groups.

Expertise

The RCAC team is staffed with experts across multiple scientific domains, both in terms of data and the use of particular software applications and programming languages. Our team offers support and consultations to help with anything and everything to do with data science at Purdue.

Join us for a break any time at one of our regular coffee consultations, Tuesday-Thursday at 2pm at one of Purdue's coffee shops. No appointment necessary.

Education

Starting in Fall 2018, the RCAC group offers official Software Carpentry workshops to get novice researchers off the ground with modern scientific computing. These workshops teach the basics of the Unix shell, version control with Git, and programming in languages like Python and R.

We also offer a more advanced series of workshops on cluster computing. These are offered every semester as well as in the form of guest lectures upon request.