BoilerGrid

Overview of BoilerGrid

BoilerGrid is a large, high-throughput, distributed computing system provided by RCAC and using the Condor system developed by the Condor Project at the University of Wisconsin. BoilerGrid provides a means for users to run programs on large numbers of otherwise idle computers in various locations, including both high-performance resources momentarily under-utilized and desktop lab machines not currently in use. Whenever a local user or scheduled job needs a given machine, the Condor job is stopped and sent to another Condor node as soon as possible. Because this model limits the ability to accomplish parallel processing and communications, RCAC decided to limit access to smaller, serial jobs. Condor jobs can be submitted from most of the RCAC systems (Gray, Pete, Prospero, Radon, Rossmann, Steele, Venice). You may also install Condor on your own desktop machine, and submit from that.

Detailed Hardware Specification

BoilerGrid scavenges cycles from nearly all RCAC systems, including community clusters, specialized systems, and the recycled cluster. BoilerGrid also uses idle time of machines in student labs on the Purdue West Lafayette campus, the Purdue Calumet campus and the University of Notre Dame. Whenever the normal scheduling system on these machines sends a job to a node, Condor preempts or (if possible) checkpoints its work, then immediately surrenders the node to the scheduled job.

BoilerGrid currently consists of over 20,000 processors. Of these, about 10,500 are Linux/x86_64, approximately 600 are Linux/Intel (ia32), and approximately 11,000 are WinNT51/Intel. There are also small numbers of Itanium Linux, Solaris and Mac OSX nodes. Memory on compute nodes ranges from 512 MB to 32 GB, and most processors run at 3 GHz or faster. With a total of over 60 TFLOPS available, BoilerGrid can provide large numbers of cycles in a short amount of time. All shared areas and software packages available on the RCAC systems are available on Condor. Condor is designed for high-throughput computing and is excellent for parameter sweeps, Monte Carlo simulations, or nearly any serial application.

Owner Arch/OS Processors
ITaP - RCAC x86_64/Linux ~10500
ITaP - RCAC Intel/Linux ~660
ITaP - Envision Center Intel/Linux 48
ITaP - Teaching & Learning Intel/WinNTXX ~9300
Purdue Calumet Intel/WinNT51 ~250
Notre Dame CSE Intel/Linux, Sun4u/Solaris28, PPC/OSX, x86_64/Linux ~230
Purdue Biology, Libraries, & other ITaP Intel/Linux, Intel/WinNT51 187

BoilerGrid currently runs the latest stable release of Condor: 7.0.1. BoilerGrid status may be monitored using CondorView.