Other Materials
Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University
Implementing a Central Quill Database in a Large Condor Installation (Condor Week 2008)
BoilerGrid for cyro-EM image processing (Condor Week 2008)
Vanilla Universe Example for Execution on Differing Architectures
A more complex example of a heterogeneous submission occurs when a job may be executed on many different architectures to gain full use of a diverse architecture and operating system pool. If the executables are available for the different architectures, then a modification to the submit description file will allow Condor to choose an executable after an available machine is chosen.A special-purpose Machine Ad substitution macro can be used in the executable, environment, and arguments attributes in the submit description file. The macro has the form
$$(MachineAdAttribute)Note that this macro is ignored in all other submit description attributes. The $$() informs Condor to substitute the requested MachineAdAttribute from the machine where the job will be executed.
An example of the heterogeneous job submission has executables available for three platforms: LINUX Intel, Solaris26 Intel, and Irix 6.5 SGI machines. This example uses povray to render images using a popular free rendering engine.
The substitution macro chooses a specific executable after a platform for running the job is chosen. These executables must therefore be named based on the machine attributes that describe a platform. The executables named
povray.LINUX.INTEL povray.SOLARIS26.INTEL povray.IRIX65.SGIwill work correctly for the macro
povray.$$(OpSys).$$(Arch)The executables or links to executables with this name are placed into the initial working directory so that they may be found by Condor. A submit description file that queues three jobs for this example:
####################
#
# Example of heterogeneous submission
#
####################
universe = vanilla
Executable = povray.$$(OpSys).$$(Arch)
Log = povray.log
Output = povray.out.$(Process)
Error = povray.err.$(Process)
Requirements = (Arch == "INTEL" && OpSys == "LINUX") || \
(Arch == "INTEL" && OpSys =="SOLARIS26") || \
(Arch == "SGI" && OpSys == "IRIX65")
Arguments = +W1024 +H768 +Iimage1.pov
Queue
Arguments = +W1024 +H768 +Iimage2.pov
Queue
Arguments = +W1024 +H768 +Iimage3.pov
Queue
These jobs are submitted to the vanilla universe to assure that once a job
is started on a specific platform, it will finish running on that platform.
Switching platforms in the middle of job execution cannot work correctly. There are two common errors made with the substitution macro. The first is the use of a non-existent MachineAdAttribute. If the specified MachineAdAttribute does not exist in the machine's ClassAd, then Condor will place the job in the machine state of hold until the problem is resolved.
The second common error occurs due to an incomplete job set up. For example, the submit description file given above specifies three available executables. If one is missing, Condor report back that an executable is missing when it happens to match the job with a resource that requires the missing binary.
