Condor Boot Camp at Purdue
Lecture Materials
- Using Condor (Powerpoint) (PDF)
- Administrating Condor (Powerpoint) (PDF) (Handout)
- Condor Tutorial
Other Materials
Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University
Implementing a Central Quill Database in a Large Condor Installation (Condor Week 2008)
BoilerGrid for cyro-EM image processing (Condor Week 2008)
This site is currently under construction. Please check back frequently for updates.
BoilerGrid Condor FAQ
- Is this "the grid"?
- Who else is running machines in BoilerGrid? How many are out there, and what are they?
- What about my Department's firewalls?
- I have Linux systems with USB peripherals
- I have multi-core machines. Can I tie console activity to one or all of the cores?
- Who can run jobs on my machines?
- When can jobs run on my machine?
- Do my own users get preference on my machines?
Is this "the grid"?
BoilerGrid is a "campus grid", much like the NSF TeraGrid is a national-scale grid. BoilerGrid uses the Condor software developed at the University of Wisconsin to connect HPC cluster systems, computer labs, and desktops into a distributed computing grid. Ideal Condor applications are ones which are "pleasantly" parallel, such as parameter sweeps, Monte Carlo simulations, etc.
Who else is running machines in BoilerGrid? How many are out there, and what are they?
The Rosen Center for Advanced computing offers 9000 Linux cores, on x86, x86_64, and PPC64 architectures. Purdue's IT Teaching and Learning Technologies offer 5000 cores of Windows XP systems, and various Linux, Windows, and Solaris systems are operated around Purdue from the Libraries, Physics, Biology, the College of Techology, and around the state at Purdue Calumet and Notre Dame.
In all, there are 15000 cores available to BoilerGrid users.
What about my Department's firewalls?
Condor does require bidirectional communication between the "central manager" system and the submit or execution machines. If you run firewalls, you may need to open a couple of exceptions.
- Port 9816 needs to be able to reach
egret.rcac.purdue.edu
- You need a range of ports for Condor to restrict its dynamic connections to. Add the following to your config files:
HIGHPORT = 50500 LOWPORT = 50000
- Add the port range you define in your Condor config files to your firewall configuration. If all Purdue netblocks is too broad for your comfort, Systems in RCAC that need to be able to communicate through the firewalls all reside on the following netblocks, reachable within campus.
128.211.128/19 172.18.0/16
Additionally, to be able to reach general campus machines, you can retrieve a list of other subnets that have Condor nodes with the following command, useful for feeding iptables:condor_status -pool egret.rcac.purdue.edu -format '%s\n' StartdIpAddr | sort | uniq
The Condor Windows installer can add exceptions for itself to Windows firewall.
I have Linux systems with USB peripherals
Condor on Linux has some difficulty detecting keyboard and mouse activity with USB peripherals. Steven Wilson (Structural Biology) came up with the following solution:
"I wrote a short program (condor_usb_fix) to monitor USB activity (/dev/input/mice, in my case) and then update the access time on another file ( I use /dev/condor_mouse). I changed Condor's configuration to include this new "device file" (condor_mouse):CONSOLE_DEVICES = mouse, console, condor_mouseThen I added lines to start and stop my daemon from the Condor init script."
I have multi-core machines. Can I tie console activity to one or all of the cores?
In your condor_config file, set
VIRTUAL_MACHINES_CONNECTED_TO_CONSOLEto the number of cores you tie to console activity. For example, if you set it to one on your four-core machine, using the console of a machine will still allow the other 3 cores to run jobs. Likewise, you can set it to 4 to allow console activity to completely claim the machine. The default is to be equal to the total number of cores.
Who can run jobs on my machines?
When can jobs run on my machine?
Do my own users get preference on my machines?
One of Condor's guiding principles is to leave the machine's owner in control of the execution policy.
The short answer is that you, the administrator decides. By default, Condor will allow jobs to run when the machine is idle for 15 minutes, but you can decide that you want to only allow jobs to run overnight. Configuring your systems to prefer jobs from your own users is simple to do, as well.
Documentation on how to configure the policy for the Condor "startd" can be found in the manual page at Wisconsin. Or, RCAC staff will be happy to help you customize a policy that meets your needs. Just mail rcac-help@purdue.edu if you have questions.