ITaP-led team awarded NSF grant to create composable campus cloud ecosystem for research
July 28, 2020
A team comprised of technical experts from ITaP Research Computing and computational faculty from across campus has been awarded a two-year grant from the National Science Foundation in the amount of $392,205 (NSF award number #2018926) to develop a campus cloud ecosystem that will knit together data, instruments and computing resources for researchers who need interactive computing capabilities beyond traditional batch high-performance computing.
PI Preston Smith, executive director of ITaP Research Computing, and Co-PI Erik Gough, a senior computational scientist for Research Computing, will build this new capability into a forward-looking service known as the Purdue Community Cloud, which will complement Purdue’s successful community cluster program. Norbert Neumeister, professor of physics and astronomy, Jennifer Wisecaver, assistant professor of biochemistry, and Thomas Hacker, professor of computer and information technology, are faculty co-PIs on the project.
“This composable system will help Purdue faculty meet their computing needs involving data science and reproducibility by allowing ITaP to deploy container technology at scale,” says Alex Younts, the ITaP research computing architect who designed the new system. It is the first system at Purdue to use Kubernetes, an open-source system for automating the deployment, scaling and management of containerized applications.
Applications that the Purdue Community Cloud is designed to serve include:
- Interactive, large-scale data analysis of data hosted by Purdue’s CMS Tier-2 facility.
- Easily deploying web applications to connect data hosted within the Purdue Community Clusters to external systems such as the UCSC Genome Browser.
- Facilitating the sharing and use of visual data across distributed research groups in Civil Engineering, automating the process of classifying and organizing the hundreds of thousands of images of structures after a natural disaster to inform building codes.
- Easily deploying SQL databases holding millions of records and multiple TB of sensor data, to allow for improved analytics.
- Deploying middleware and data engines to support research in plant phenotyping and remote sensing. The composable cloud system will support streaming data, IoT applications, and bridge connectivity to Purdue’s HPC environments.
- Deploying Spark Streaming and machine learning capabilities to support data streams in Time Domain astronomy.
- Science Gateways, such as the NSF-funded GeoEDF or NanoHUB platforms, can use the composable platform to host the building blocks that make gateways possible.
- Easily deploying and terminating large numbers of interactive computing containers for instruction in data science and machine learning. Python notebooks or web-based R environments are all available.
“A prototype version of this capability is being taken to production with Bell, our 2020 community cluster, and we’ve already helped several research projects take advantage of it,” says Smith. “We’re excited for this award, which will allow us to make this capability a first-class citizen in our Community Cluster offerings.”
The project team will involve undergraduate students in the deployment and operation of the Purdue Community Cloud, providing opportunities for students to apply their conceptual knowledge to real-world problems.