Purdue offers Condor resources to TeraGrid community

May 21, 2007

Purdue University’s Rosen Center for Advanced Computing has become the largest provider of high-throughput computing cycles on the National Science Foundation’s TeraGrid.

Carol X. Song, senior research scientist in the Rosen Center and principal investigator for TeraGrid at Purdue, says that more than 6,400 computers of all sizes—from desktop machines used by students to do homework and check e-mail, up to large, powerful research computers—are linked together using the open source application Condor.

"By using Condor and making resources available over the TeraGrid, we are leveraging our national and international science resources," Song says. "We will continue to expand our Condor pool to include additional machines as well as machines at other campuses through regional grids."

The latest addition of more than 2,200 machines to the Condor pool includes 1,000 computer lab machines on the West Lafayette campus, 300 computer lab machines on the Purdue Calumet campus, 300 workstations at the University of Notre Dame campus, and 664 processors in a cluster maintained for research computing at the Rosen Center in West Lafayette.

Miron Livny, professor of computer science at the University of Wisconsin, says that Purdue's Condor pool is the largest in the nation.

“Purdue is committed to a vision, and they are making that vision a reality. I am pleased to say that early on I worked closely with people at Purdue, and we shared this vision for research computing," Livny says. “I think it's wonderful that Purdue has taken the leadership on this on the TeraGrid. And I don't pass out these kinds of compliments often."

One researcher, Michael Deem, Rice University's John W. Cox Professor of Chemical Engineering, has used more than 2 million hours of computer cycles at Purdue to catalog the chemical structure of compounds called zeolites. This team’s zeolite database now contains more than 3 million structures, and it is still growing.

Deem aims to identify and categorize as many of these structures as possible so that chemical engineers can select the exact zeolite they need. This is just the kind of high-throughput job that works well on Purdue's distributed computing system.

“The throughput is much higher there than I can get locally because of the large size of the Condor pool at Purdue," Deem says. “Purdue is doing a great service to the scientific community by providing this resource."

The distributed computing resource is available over the TeraGrid, of which Purdue is one of nine resource provider sites. Charlie Catlett, the chief information officer at Argonne National Laboratory and chair of the TeraGrid Resource Provider Forum, says that it is important to provide a variety of computing resources to researchers.

“High-throughput, or capacity, computing is extremely important to the TeraGrid user community," Catlett says. “Purdue and the Condor team have provided an excellent model for harnessing campus cyberinfrastructure in a way that benefits local users and also serves the national community."

The computers in the Condor pool at Purdue are used roughly 45 percent of the time for their intended purpose, 45 percent for Condor, and they are idle the other 10 percent of the time.

“This shows that our site can provide significant computing power to the nation without requiring dedicated resources," Song says.

In April, more than 5 million hours were awarded to users of Purdue Condor resources at the TeraGrid allocations meeting.

Preston Smith, a sysems research engineer for Purdue’s Rosen Center, says that Purdue has refined its use of the software by using it as a secondary scheduling system on the computers, which allows the computers to be put to use whenever they are available instead of making them available only at certain times, such as at night. The primary schedule for computing jobs at the Rosen Center is the Portable Batch System, or PBS. Purdue uses PBS Pro.

“The thing we do that is unique is that we use Condor in tandem with PBS Pro," Smith says. PBS Pro was developed by Altair Engineering.

Condor and PBS Pro are connected so that they can “talk" to each other before a job is assigned to see what computers are available. This scheduling tool allows Condor to send a job to a computer whenever it's not being used instead of at set times, which allows many more unused computing cycles to be harvested, Smith says.

Livny says that he hopes Condor usage increases at other universities and that the now-wasted cycles can be put to good use.

“Other campuses should follow Purdue's leadership," Livny says. “I believe this is the right way for us to move forward, get organized and get resources together, and then go out on the national level and share resources with other institutions."

Originally posted: May 21, 2007