New data storage options for cloud workflows now available with RCAC
The Rosen Center for Advanced Computing (RCAC) has recently upgraded the data storage options available to researchers. Thanks to a grant awarded through the National Science Foundation’s Campus Cyberinfrastructure (CC*) program, RCAC now offers storage capabilities to better cover the needs of those who utilize RCAC’s Kubernetes-based composable cloud platform, virtualization services on Community Clusters, or need programmatic access to data for workflows spanning multiple RCAC resources.
Traditionally, high-performance resources (HPC) on university campuses have offered multiple tiers of storage to accommodate various workflows, but these options are provided by either POSIX-based storage systems or robotic tape libraries. With the advent of cloud-native technologies like Kubernetes, these traditional storage options are not sufficient for researchers who compute via the cloud. Titled “CC* Data Storage: Software Defined Storage for Composable and HPC Workflows,” this CC* project was aimed at closing the gap between campus and cloud storage through the deployment of a central, shared, multi-petabyte Ceph distributed storage system. The project held three major goals: to enhance campus shared storage capabilities, to support Science Domains via innovative storage infrastructure, and to enable education and workforce development by engaging with undergraduate students in the deployment and operation of the storage infrastructure.
“Five years ago,” says Erik Gough, a Senior Research Scientist for RCAC, “we saw the rise of Kubernetes and cloud computing in the campus environment, and some of the emerging scientific use cases didn’t fit that well onto our batch HPC systems. Our traditional storage resources didn’t work, so we needed to have software defined storage for that system. To address the problem, we deployed a small Ceph cluster inside Geddes that people could use. Through this grant, we can significantly expand our storage capacity and support multiple science domains across campus.”
The CC* grant was awarded in fall of 2022. Erik Gough is the Principle Investigator (PI) for the project, with Elizabett Hillery and Di Qi serving as Co-Pis. Wen-wen Tung also served as a former Co-PI. Together, the group not only expanded RCAC’s data storage options, but fostered education and workforce development. Undergraduate students enrolled in one of Tung’s courses were brought into the project, helping to build and deploy the new Ceph storage system with RCAC’s Geddes cluster. They were able to gain real-world experience in the field of high-performance computing.
Thanks to the CC* award and the hard work of both the PI team and the students, RCAC now offers new storage options, including Data Depot Block and Data Depot Object Storage:
- "Depot Object" - S3 compatible object storage for programmatic access to data across RCAC resources and a cold storage tier
- "Depot Block" - Block storage for virtualization on HPC resources and composable systems
If you would like to learn more about the available storage resources, please visit: https://www.rcac.purdue.edu/storage
The CC* award is funded under NSF award number 2232872. To learn more about High-Performance Computing, please visit our “Why HPC?” page. To stay up-to-date on all RCAC projects and updates, please visit our “News” page.
Written by: Jonathan Poole, poole43@purdue.edu