BoilerGrid

Overview of BoilerGrid

BoilerGrid is a large, high-throughput, distributed computing system operated by ITaP, and using the HTCondor system developed by the HTCondor Project at the University of Wisconsin. BoilerGrid provides a way for you to run programs on large numbers of otherwise idle computers in various locations, including any temporarily under-utilized high-performance cluster resources as well as some desktop machines not currently in use. Whenever a local user or scheduled job needs a machine back, HTCondor stops its job and sends it to another HTCondor node as soon as possible. Because this model limits the ability to do parallel processing and communications, BoilerGrid is only appropriate for relatively quick serial jobs.

Detailed Hardware Specification

BoilerGrid scavenges cycles from many ITaP research systems. BoilerGrid also uses idle time of machines around the Purdue West Lafayette campus. Whenever the primary scheduling system on any of these machines needs a compute node back or a user sits down and starts to use a desktop computer, HTCondor will stop its job and, if possible, checkpoint its work. HTCondor then immediately tries to restart this job on some other available compute node in BoilerGrid.

A recent snapshot of BoilerGrid found 36,524 total processor cores. Memory on compute nodes ranges from 512 MB to 192 GB, and most processors run at 2 GHz or faster. With a total of over 60 TFLOPS available, BoilerGrid can provide large numbers of cycles in a short amount of time. HTCondor offers high-throughput computing and is excellent for parameter sweeps, Monte Carlo simulations, or nearly any serial application that can run in one hour or less.

BoilerGrid currently uses HTCondor 7.8.7.