Skip to main content

Current REU ProjectsAnvil REU Logo

In each research project, students will work closely with two or more members of our staff. The projects will be in a wide variety of areas, including but not limitied to various High Performance Computing (HPC) & Research (RC) topics: 

  • AI & machine learning
  • Data analytics
  • Dasboards & visualization
  • HPC software deployment & solutions
  • Interface design & user experience
  • HPC benchmarking & workflow scaling
  • Scientific & eduacational applications solutions
  • Containerization
  • and more!

Summer 2025 Projects

Project #1: Smart Data Solutions: Building a Data Warehouse and Sensor-Driven System for Sustainable Data Centers

Be part of a pioneering project that merges data infrastructure with sustainability! This two-phase initiative begins with building a powerful data warehouse to store and manage logs from our data center and compute systems, integrating data/log sources and creating visual dashboards to display insights gained from the data. This foundation is key to future-proofing our environment and preparing it for advanced applications like AI, large language models (LLMs), and machine learning (ML). The data warehouse will become the central hub for proactive monitoring, helping us optimize performance and reduce downtime. In the next phase, you’ll integrate state-of-the-art sensors that capture real-time data from the data center, feeding critical information back into the warehouse. These sensors will alert us to important changes, empowering us to address issues before they escalate. Collaborate with researchers and faculty to drive data center sustainability, ensuring our systems are efficient and eco-friendly for the long term. If you're passionate about AI, sustainability, and creating smarter data solutions, this project offers hands-on experience and a unique chance to make a lasting impact on the future of data center management and make a real-world impact!

Requirements: Basic understanding of Linux systems (including cli-based tooling and commands), basic understanding/ familiarity with at least one programming language (e.g. Python, C++, Ruby, etc...), and basic understanding/ familiarity/ experience with data presentation and interpretation.

Past experience with the following skills is a plus, but not strictly necessary: Interactive application/widget development, time-series databases (InfluxDB, Elasticsearch, or ClickHouse, etc.), wiring SQL queries. Students must be able to complete the following trainings prior to the start of the REU:

  1. Codecademy - Learn Python at your own pace, on your own schedule and
  2. Codecademy - Learn Git and Github
  3. Codecademy - Learn SQL
  4. Grafana tutorials - Learn the basics of grafana and data presentatio

Complete the following trainings which are all accessible through RCAC website):

  1. Unix 101 and Unix 202 trainings,
  2. Anvil 101 training, and
  3. (Optional) Anvil Open OnDemand 101 training

Project #2: AnvilOps: Streamlined Container Builder with GitOps Integration

Anvil is set to develop a dynamic, user-friendly web interface that will revolutionize the way container workloads are built and deployed onto its powerful Composable system. By harnessing the power of leading CI/CD tools such as GitHub Actions, CircleCI, or Jenkins, this project will explore and implement the best solution tailored to Anvil’s environment. The interface will follow best practices for GitOps, ensuring a seamless, automated workflow that integrates with version control for greater efficiency and reliability. Through this sleek interface, users can effortlessly upload a Dockerfile or Singularity definition file, link a Git repository, and define key details like project name, image name, and registry tag. The result? Streamlined container management and deployment at the click of a button. For added flexibility, stretch goals include Docker-to-Singularity image conversion, as well as a robust authentication system that verifies users' active allocations—ensuring security and access control. This project is an exciting opportunity to shape the future of container orchestration on Anvil, following cutting-edge GitOps principles!

Requirements: Familiarity with basic Linux commands and environments, as well as a basic understanding of, or ability to, read programming languages like Bash or Python, is required. Additionally, experience with front-end technologies (e.g., HTML, CSS, JavaScript, or relevant frameworks) is highly preferred.

Past experience with the following skills is a plus but not strictly necessary if the candidate can demonstrate successful completion of a relevant tutorial prior to the start of the REU program.

Students must be able to complete the following trainings prior to the start of the REU program:

  1. Kubernetes - Basics
  2. Docker - Basics

Project #3: Empowering Research: Easy-to-Use Bioinformatics Workflow Templates

Want to contribute to a game-changing project that simplifies complex bioinformatics analyses? This initiative focuses on creating user-friendly workflow templates for essential genomics tasks like RNA-seq, variant calling, and genome assembly. You will create ready-to-use templates that come packed with optimized SLURM job scripts, pre-configured for memory, CPU, and runtime efficiency, plus organized file structures to keep data management stress-free. Design for command-line users who want to skip the hassle of learning complicated tools like Nextflow or Snakemake, this project offers a rare opportunity to make cutting-edge research accessible to a wider audience. You’ll play a key role in boosting efficiency, improving resource utilization, and ensuring reproducibility in high-performance computing environments. If you're passionate about making impactful contributions to science and technology, this project is for you!

Requirements: Proficient with basic Linx commands and environments. Familiarity with programming languages, such as Bash or Python, and the ability to read and modify scripts. Understanding of bioinformatics concepts, particularly in RNA-seq, variant calling, and genome assembly

Past experience with the following skills are a plus, but not strictly necessary if the candidate can demonstrate successful completion of a relevant tutorial prior to the start of the REU program:

  1. Coursework/exposure in bioinformatics, computational biology or related fields
  2. Coursera’s “Genomic Data Science Specialization” courses
  3. Rosalind’s “Learning bioinformatics and programming through problems”
  4. Learn Genomics
  5. Bioinformatics Handbook
  6. Familiarity with workflow management tools (e.g. Nextflow, Snakemake)

Project #4: Revolutionizing Documentation with AI: NLP and Large Language Models for Automated Solutions

Join a cutting-edge project at the intersection of Natural Language Processing (NLP) and large language models (LLMs) to transform how technical documents are created. This project focuses on enhancing our custom tool, TicketHub, to automatically generate FAQs and standard responses by extracting key information from systems, past support requests, and other documentation/updates. You’ll not only work on improving AI models for document generation but also gain hands-on experience in software and data engineering. By streamlining documentation processes, you’ll help reduce manual work, solve common user issues, and contribute to an AI-driven future for technical support and user experience. This is a unique opportunity to apply your skills in AI and engineering to create practical, impactful solutions in the real world!

 

Requirements: Candidate must be familiar with foundational data analysis and AI methods and frameworks, including data cleaning, data engineering /pipeline development, and knowledge of ML and AI foundations.

Prior experience and knowledge of LLMs is a plus, but not strictly necessary if candidate can demonstrate successful completion of a relevant class like Coursera’s Applied Machine Learning in Python course prior to the start of the REU program.

Past REU Projects

Summer 2024

Project #1 colleague of project outcomes and interns

Project #1: Streamlined Software: Automating user-requested software deployment on Anvil using agile technologies

Project #2: Unlocking the Impact of Data: The Power of the Dashboard

Project #3: Gromacs Gateway: Creating User-friendly Molecular Simulations Online

Project #4: AI-Powered Operational Data Analytics: Enhancing User Experience on Anvil

Summer 2023

Projects focused on a wide variety of areas, including data analytics, high performance computing, DevOps, and containerization.

Project #1: Integrate the XALT job-level usage activity monitoring tool into XDMoD reporting for deeper analysis of workloads.

Project #2: Implement direct cloud burst from Anvil to Azure for HPC and accelerator workloads based on the work with Microsoft in 2022.

Project #3: Connect the Anvil composable subsystem’s Rancher management platform to Azure Kubernetes Service to support elastic-scaling of workloads for science gateway applications.

Project #4: Develop deployment solutions for the Jupyter notebook interactive computing platform on Anvil’s composable subsystem for education and training activities.

Summer 2022

Projects focused on improving data collection and reporting of the Anvil cluster to ensure quality of service, effective system utilization, and performance.

Project #1

While benchmarking compute and storage performance, students:

  • Learned concepts surrounding benchmarking of HPC systems, including HPL, HPCG, STREAM, and IO500
  • Measured performance of Anvil's compute nodes, GPU nodes, scratch, project, and BeeOND filesystems
  • Established baselines for system performance for a continuous measurement framework
Project #2

To improve data collection and reporting, students:

  • Configured multiple modules on Anvil's XDMoD instance and improved data collection and reporting of system metrics
  • PCP, SUPREMM
  • Open OnDemand data collection
  • Continuous measurement framework via Application Kernels
Project #3

For system environmental measurements, students:

  • Created monitoring of Anvil's power consumption at the node and rack levels using Prometheus and Grafana

Fall 2022

Enhance Anvil's cybersecurity posture, utilizing an NSF CICI-funded intrusion detection system to monitor aspects of Anvil's network and enable visualization-driven insights into network traffic and cybersecurity alerts. Students helped with:

  • Visualization of network traffic trends on Anvil and Purdue networks
  • Dashboard development based on Zeek IDS protocol logs (TCD, UDP, SSH, HTTP)
  • Port scanning
  • Per-host traffic visualization

Nondiscrimination Policy Statement

Purdue University prohibits discrimination against any member of the University community on the basis of race, religion, color, sex, age, national origin or ancestry, genetic information, marital status, parental status, sexual orientation, gender identity and expression, disability, or status as a veteran. The University will conduct its programs, services and activities consistent with applicable federal, state and local laws, regulations and orders and in conformance with the procedures and limitations as set forth in Purdue’s Equal Opportunity, Equal Access and Affirmative Action policy which provides specific contractual rights and remedies. Additionally, the University promotes the full realization of equal employment opportunity for women, minorities, persons with disabilities and veterans through its affirmative action program. View a more complete statement of Purdue's policies of equal access and equal opportunity. If you have any questions or concerns regarding these policies, please contact the Office of the Vice President for Ethics and Compliance at vpec@purdue.edu or 765-494-5830.


Anvil is supported by the National Science Foundation under Grant No. 2005632.