Anvil used to train cancer researchers in big data analysis
Purdue’s powerful Anvil supercomputer is being used to train a group of cancer researchers – from graduate students to professors and practicing physicians – on big data management, analysis and visualization skills.
Min Zhang, professor of statistics, used Anvil in her 2022 Big Data Training for Cancer Research (“Big Care”) workshop, the latest in a series of biomedical big data analysis workshops she’s organized. Since 2020, the workshops have been supported by funding from the National Cancer Institute and have specifically focused on cancer research.
Anvil’s speed and processing power meant that this year the organizers were able to invite more participants than originally planned.
“There’s no way we could do this for so many people without Anvil,” says Zhang.
In the past, Zhang and her collaborators have used other Rosen Center for Advanced Computing (RCAC) resources such as Rice and Bell for the workshops. On those systems, each student and instructor was allocated one node of the cluster to use during the workshop. With Anvil, they instead received a total number of core hours of computing time, which gave the instructors greater flexibility for big projects involving processing several terabytes of data.
“We gave the students the freedom to do real practice with real big data,” says Doug Crabill, senior academic IT specialist for the department of statistics.
The switch to an hours-based allocation has also meant that the learning didn’t stop at the end of the workshop, since students can continue logging on from home for the rest of the year.
Many of the participants had little to no programming experience prior to the workshop, but with the help of RCAC staff and computing resources, they were able to quickly get up and running to do hands-on work with the data.
Zhang says that students who completed the course appreciated the fact that they didn’t have to use their own computing power and had access to the power and speed of Anvil. Students commented that “logging into Anvil was easy,” “it’s hard to see how we would do this without Anvil,” and Anvil was “very fast and easy to use.”
The Big Care workshop was co-led by Nadia Lanman, co-manager Collaborative Core for Cancer Bioinformatics, Purdue Center for Cancer Research, and Dabao Zhang, professor of statistics.
Anvil, which was recently named the 143rd fastest supercomputer in the world, has now completed its early user testing phase and is available for the general public to use. Researchers may request access to Anvil via the ACCESS allocations process.
Writer: Adrienne Miller, science and technology writer, Rosen Center for Advanced Computing, firstname.lastname@example.org.