Professor teaches big data analysis using new, interactive version of ITaP’s Scholar cluster
You might not expect a 7:30 a.m. class to be lively and well-attended, but Mark Daniel Ward packed students into his classroom at that hour last semester. The subject? Techniques for analyzing big data through a series of projects using Purdue’s Scholar cluster.
The new data analysis seminar, Statistics 19000, is the result of a collaboration between Ward, an associate professor of statistics, and an ITaP team led by Stephen Harrell, an ITaP Research Computing scientific applications analyst. Ward teaches data analysis to sophomore students through the National Science Foundation-funded Statistics Living-Learning Community and was interested in offering a similar course to students who were not involved in that program.
With funding Ward obtained from Dennis Minchella, the College of Science’s associate dean for undergraduate education, under an Instructional Technology grant, ITaP purchased necessary equipment and built a new version of Scholar, a cluster supercomputer ITaP makes available for classroom use, with interactive capabilities.
Until the recent updates, Scholar only supported projects in batch mode, in which code is submitted and then processed at some later time without further human interaction. This works well for many purposes, but students who are just starting out find it more intuitive to use the cluster in interactive mode, where they can use command line and menu-driven analysis tools.
Scholar can still be used for classical batch processing, and other Purdue classes continue to use it that way, but – thanks to the faster processor and additional memory that Ward’s grant purchased – it now supports long-running interactive jobs as well.
Ward’s class is a seminar for students of any background who want an introduction to data analysis. There are no prerequisites and students aren’t expected to have previous UNIX experience, making it accessible to everyone from freshmen to graduate students.
“The class isn’t too theoretical,” Ward says. “I’m really just giving them a chance to get their hands dirty with data.”
Students completed a variety of projects that introduced them to UNIX shell scripting and different data analysis tools, including R, SQL and XML. Ward used this seminar to gauge student interest in a new data science major that Statistics and Computer Science will be jointly introducing. If the course registration and student feedback are any indication, the new major will be a rousing success.
Holly Thieman, a senior double majoring in pharmaceutical sciences and statistics, hopes to combine her majors by working in quality control for the pharmaceutical industry. She took Ward’s class to learn more about R and other programs that are commonly used in that industry. “I really wanted to get a little bit more comfortable working with different computer programs to do statistical analysis or just to take a better look at data, and I feel like I definitely got that,” she says.
The changes to Scholar aren’t the only way in which ITaP supported the class. Harrell and Kevin O’Shea, ITaP’s manager of innovations in teaching and learning, worked to make sure that the classroom Ward uses has wireless network bandwidth that can handle 100 users logged on at once and running computing-intensive applications remotely.
Doug Crabill, the senior academic IT specialist for statistics, who worked with Ward and ITaP to implement the technology and make sure the students had a smooth experience, cited ITaP’s round-the-clock staff support, Data Depot data storage resource and functionality for automatically adding and dropping students as cluster users in response to changes in registration as additional benefits of the partnership.
“This would not have been nearly as nice without ITaP’s support,” says Ward.
It’s safe to say his students thought the class was well worth waking up early for. “What I learned in this seminar helped me land my dream internship this summer,” says Yahia Aly, a sophomore double majoring in aerospace engineering and statistics. “I was able to talk about the projects we did in the class during my interview, and it really impressed the hiring manager.”
Scholar is available to any faculty member at Purdue who is interested in teaching subjects that could benefit from high-performance computing. ITaP now has an automated add-a-class page for Scholar where an interested professor can quickly sign up a class without needing to submit a request and wait for a response.
To learn more about Scholar, contact ITaP Research Computing’s Harrell, email@example.com. Ward is also happy to share his experience with anyone who is considering implementing something similar in their classroom. He can be reached at firstname.lastname@example.org.