Skip to main content

Data Science and Machine Learning Training

RCAC offers training on a variety of data science and machine learning topics on a regular cadence.  These trainings are intended to offer an applied view of data science to help researchers effectively utilize such techniques and the clusters to further their work.

If just getting started on the clusters, users are encouraged to check out our general computing training resources as well.

Foundational Data Science and Machine Learning Topics

  • Data 101 for Machine Learning gives an introduction about how data is used for machine learning covering data collection, pre-processing, and exploratory data analysis.
  • Time Series Forecasting 101 and 201 explore Machine Learning and Deep Learning techniques to analyze and forecast time series data in high-performance computing environments.
  • Natural Language Processing 101 provides an introduction to core concepts of modern NLP to help users get started with incorporating NLP into their work.
  • Purdue is also pleased to partner with Nvidia Deep Learning Institute to regularly offer free full-day Fundamentals of Deep Learning workshops

Large-Scale Data Science and Related Topics

  • Jupyter Kernels and HPC is an intermediate level discussion of how the Jupyter system (JupyterHub, Jupyter Notebook, etc.) function at an application level on a distributed computing cluster.
  • Python Package Installation covers how to install scientific Python applications on RCAC clusters.
  • Research Storage Overview covers RCAC storage resources and services, typical use cases, access methods and data management strategies and workflows.
  • [Coming Soon!] Accelerated Machine Learning with GPUs