Gilbreth cluster storage capacity doubled to meet the needs of AI researchers
Recognizing the need for faster and larger storage capacity in emerging areas of science, Purdue’s Rosen Center for Advanced Computing (RCAC) has recently doubled the storage capacity in Gilbreth, its community cluster that is optimized for communities running GPU intensive applications such as AI and machine learning.
Gilbreth’s storage was not only upgraded to feature twice the capacity of the previous storage, but also features an improved design that results in faster storage transactions.
The new storage system uses DDN’s Exascaler 400NVX2-S appliance with a total capacity of 4.56 PB (4.3 PB usable) and features a tiered approach just like before, but it offers a much larger persistent, fast nonvolatile memory express tier to improve the metadata handling and data caching. The new storage controller designer and its significantly improved hardware will bring the controller closer to the storage and speed up data storage processes.
These updates enhance the data pipeline to Gilbreth’s GPUs and processors and ultimately help host and access larger datasets and speed up applications, reducing researchers’ time to science.
“This improvement to Gilbreth’s storage is part of our many steps to support cutting-edge research in artificial intelligence and scientific domains,” says Arman Pazouki, RCAC’s director of scientific applications.
“As the number of GPUs and FLOPs (floating point operations per second) of each GPU grow, they will continue to work with larger quantities of data in each simulation. AI applications also rely on large amounts of data and files. The Purdue Computes initiative will have a defining role in the growth of GPU utilization on campus. All of this points to the significance of data, both the capacity and transfer speed, and that’s what this upgrade aimed to address,” adds Pazouki.
RCAC’s early benchmarks demonstrate significant read/write improvements of both data throughput (average 116% improvement) and input/output operations per second (average 25% improvement) with the new storage system. Higher throughput means faster data transfer, while higher input/output operations correspond to better storage performance and responsiveness.
With the new storage system, “my students noticed a noticeably reduced time to load large libraries,” says Daniel Aliaga, associate professor of computer science, who used 100 Gilbreth GPUs to create urban representations for more than 330 US cities.
In addition to that work, Aliaga and his students use Gilbreth’s GPUs for several other projects, including:
- a project on computational archeology where they’re trying to infer and reconstruct ancient archaeological sites in Peru, Greece and Turkey, in collaboration with archaeologists from Brown University, Vanderbilt University and Purdue;
- exploiting large language models (LLMs) to help with urban design and planning, combining urban geometric data with socio-economic information from census data in order to be able to propose urban layout improvements;
- exploiting generative AI methods to produce detailed photorealistic indoor urban spaces useful for a variety of applications; and
- developing methods to localize trees in satellite images for cities nationwide.
“This investment in resources, and pricing structures that are better than do-it-yourself options will ensure that Purdue faculty have access to cutting-edge resources to ensure their competitiveness, while also benefitting from the community cluster program’s professional maintenance and scientific support to ensure that their research is appropriately safeguarded, and they can focus on scientific outcomes instead of technology problems,” says Preston Smith, executive director of RCAC.
To learn more about Gilbreth or other Research Computing resources, contact rcac-help@purdue.edu.