Other Materials
Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University
Implementing a Central Quill Database in a Large Condor Installation (Condor Week 2008)
BoilerGrid for cyro-EM image processing (Condor Week 2008)
Standard Universe Example for Execution on Differing Architectures
Jobs submitted to the standard universe may produce checkpoints. A checkpoint can then be used to start up and continue execution of a partially completed job. For a partially completed job, the checkpoint and the job are specific to a platform. If migrated to a different machine, correct execution requires that the platform must remain the same.In previous versions of Condor, the author of the heterogeneous submission file would need to write extra policy expressions in the requirements expression to force Condor to choose the same type of platform when continuing a checkpointed job. However, since it is needed in the common case, this additional policy is now automatically added to the requirements expression. The additional expression is added provided the user does not use CkptArch in the requirements expression. Condor will remain backward compatible for those users who have explicitly specified CkptRequirements-implying use of CkptArch, in their requirements expression.
The expression added when the attribute CkptArch is not specified will default to
# Added by Condor
CkptRequirements = ((CkptArch == Arch) || (CkptArch =?= UNDEFINED)) && \
((CkptOpSys == OpSys) || (CkptOpSys =?= UNDEFINED))
Requirements = () && $(CkptRequirements)
The behavior of the CkptRequirements expressions and its addition to
requirements is as follows. The CkptRequirements expression guarantees
correct operation in the two possible cases for a job. In the first case,
the job has not produced a checkpoint. The ClassAd attributes CkptArch and
CkptOpSys will be undefined, and therefore the meta operator (=?=) evaluates
to true. In the second case, the job has produced a checkpoint. The Machine
ClassAd is restricted to require further execution only on a machine of the
same platform. The attributes CkptArch and CkptOpSys will be defined,
ensuring that the platform chosen for further execution will be the same as
the one used just before the checkpoint. Note that this restriction of platforms also applies to platforms where the executables are binary compatible.
The complete submit description file for this example:
####################
#
# Example of heterogeneous submission
#
####################
universe = standard
Executable = povray.$$(OpSys).$$(Arch)
Log = povray.log
Output = povray.out.$(Process)
Error = povray.err.$(Process)
# Condor automatically adds the correct expressions to insure that the
# checkpointed jobs will restart on the correct platform types.
Requirements = ( (Arch == "INTEL" && OpSys == "LINUX") || \
(Arch == "INTEL" && OpSys =="SOLARIS26") || \
(Arch == "SGI" && OpSys == "IRIX65") )
Arguments = +W1024 +H768 +Iimage1.pov
Queue
Arguments = +W1024 +H768 +Iimage2.pov
Queue
Arguments = +W1024 +H768 +Iimage3.pov
Queue
