Future athletic rivals, Nebraska and Purdue are teammates in research computing

February 28, 2011

Nebraska may not feel like an official member of the Big Ten Conference until football season begins this fall, but Purdue and its future conference rival already are models of good sportsmanship in the field of research computing.

The University of Nebraska is the latest partner in the Purdue-led DiaGrid distributed computing project, which makes almost 40,000 computer processors available for research. DiaGrid taps computers not in use at the moment in offices, student computer labs, cluster supercomputers and more. Administered by ITaP, the pool provided nearly 20 million computing hours for research jobs over the past year.

Nebraska added 1,410 machines with 7,500 processors to the pool. The other DiaGrid partners are Indiana University, Indiana State, the University of Notre Dame, the University of Louisville, Wisconsin, Purdue's Calumet and North Central campuses and Indiana University-Purdue University Fort Wayne.

Researchers are using DiaGrid for a range of purposes — from examining the fabric of the universe, in the case of physics professors Ian Shipsey and John Peterson, to imaging the structure of viruses at near-atomic scale like biology Professor Wen Jiang.

Shipsey and Peterson are preparing for the data deluge when the Large Synoptic Survey Telescope (LSST), the largest telescope of its kind ever, comes online. "We have to do simulations now to make sure we can even analyze this much data," Peterson says. "With DiaGrid you can for periods of time use thousands of machines."

Shipsey has run research jobs involving the LSST on Nebraska's systems. "We simulate light streaming across the universe from a myriad number of distant galaxies and arriving at the largest digital camera ever constructed for astronomy — 3 billion pixels," Shipsey says. "The camera is at the heart of the LSST, which will see more of the universe in its first week of operation than all previous telescopes built by humankind."

Jiang, whose research could help inform the development of new treatments for viral illnesses, already is working with a new instrument — a state-of-the art cryoelectron microscope funded by the National Institutes of Health. The microscope produces voluminous amounts of data about virus molecular structure that can allow Jiang to image a virus down to the level of the protein molecules in its protective shell.

"For our project to succeed, processing power is one of several essential components," Jiang says.

Among other things, DiaGrid also has been used to study the Solar System's formation; project the reliability of Indiana's electrical supply; model the spread of water pollutants; and identify millions of potential zeolites, common catalysts in chemical reactions.

Preston Smith, a senior systems administrator at the Rosen Center for Advanced Computing, ITaP's research computing unit, and his peers at Nebraska began talking about partnering even before the Cornhuskers accepted an invitation to join the Big Ten last year. Nebraska, like Purdue, is part of the international collection of institutions involved in heavily computational physics experiments at the Large Hadron Collider, the giant particle accelerator on the French-Swiss border. Moreover, Smith knew his Nebraska counterparts from the community of technologists involved in campus grid computing projects and the Open Science Grid.

DiaGrid works by pooling computers over the Purdue campus network and off campus via the Internet and fast research networks. Whenever computers in the pool aren't in use — at night, when their owners are at lunch, and so on — the system sends them work. When a computer's owner returns, active jobs automatically get shifted to idle machines in the pool.

Nebraska already was running a campus computing pool using the Condor system underlying DiaGrid, so linking Nebraska's pool to DiaGrid was straightforward, says David Swanson, director of the university's Holland Computing Center.

Swanson says DiaGrid gives Nebraska a "higher burst capacity" when the university's researchers need more computational resources than available on campus.

When the campus computers aren't being used, DiaGrid puts them to work doing useful things instead of sitting idle. "We get a better return on our technology investment," Swanson says.

One of Nebraska's largest users of the pool is Derrick Stolee, a graduate student researching mathematical graph theory, problems involving a collection of nodes, individuals, and edges, which are pairs of nodes. The most famous type of graph is a social network graph, where people — nodes — are connected by friendships — edges — and the goal is to see how connected people are through their friends. Think the small world, or six degrees separation, phenomenon.

"A regular computer could theoretically run all of the code I write, but it would just take years of uninterrupted time," Stolee says. By accessing thousands of processors in the DiaGrid pool, he's able to run experiments that check tens of billions of graphs using up to 10 years of computation time within a few weeks of human time. In addition, he is making the software developed in the process available to other researchers, which should make it easier for them to take advantage of distributed computing pools like DiaGrid.

Brian Bockelman at Nebraska's Holland Computing center says Stolee's research highlights another advantage of DiaGrid: the multi-campus pool's large size generally makes plenty of processors available for everyone.

"We can afford to give graduate students with bright ideas all the computers they need," Bockelman says.

Writer: Greg Kline, science and technology writer, Information Technology at Purdue (ITaP), 765-494-8167 (office), 765-426-8545 (mobile), gkline@purdue.edu

Originally posted: February 28, 2011