Important news for Radon/Recycled cluster users

August 16, 2007

The Radon/Recycled Linux cluster was upgraded significantly during RCAC's August 15-17 maintenance window. Changes resulting from this upgrade include the following:

  • The operating system on radon.rcac.purdue.edu and the cluster's compute nodes was upgraded to the most recent version of Debian Linux -- Debian GNU/Linux 4.0 (a.k.a. etch).
  • The cluster's radon.rcac.purdue.edu front-end system was upgraded to newer hardware and its IP address was changed to 128.211.157.42.
  • The cluster's batch job scheduler was upgraded to PBSPro 8.0, which is the same job scheduler used on other RCAC Linux clusters.
  • Most compute nodes in the cluster were replaced with systems having faster processors and more memory. All nodes in the cluster now have at least 1GB of memory and 2.8, 3.0, or 3.2 GHz.
  • Due to the increased cpu speed on the cluster's compute nodes, the maximum walltime limit for PBS jobs has been set to 336 hours (i.e. 14 days). Since the operating system upgrade resulted in changes to numerous run-time libraries, RCAC recommends that all Radon/Recycled cluster users re-compile their applications before running them in the cluster's new configuration.

Whenever possible, PBS jobs that were in the system prior to the upgrade were preserved and the owners of jobs that could not be preserved have been contacted via e-mail. However, due to the previously-mentioned run-time library changes, jobs that were preserved have been "held" and will not be automatically scheduled for execution when the cluster is returned to production.

These held jobs can be identified by running the "qstat" command and looking for jobs with an "H" in the job-status column. These jobs will also have job ids containing the string "xenon". Job submitted after the cluster's upgrade will job ids containing the string "argon".

If you see that you have held jobs you know would run programs you or members of your research group have compiled on radon.rcac, please re-compile your programs, re-submit your jobs, and use the "qdel" command to delete the held jobs. For example, if your job 12345.xenon is held, you could delete it by running the command

qdel 12345.xenon.rcac.purdue.edu@argon

If, on the other hand, you have held jobs you know are going to use RCAC-supplied software (e.g. Matlab or Gaussian), and you would like them to be scheduled for execution, you may use the "qrls" command to release their holds. For example, if you have a job 12345.xenon that is held, you could release it for execution by running the command

qrls -h u 12345.xenon.rcac.purdue.edu@argon

You may specify multiple job ids on the "qdel" and/or "qrls" command lines if you have several jobs you would like to either delete or release. Also note that you must specify the full job id including "xenon.rcac.purdue.edu" when deleting or releasing these jobs.

Any jobs that are still held on Friday, August 31, will be deleted by RCAC systems staff.

If you have questions about the Radon/Recycled cluster's upgrade, need help releasing held jobs or deleting jobs, or would like assistance re-compiling your programs, please contact the RCAC user support staff by sending e-mail to rcac-help@purdue.edu.

Originally posted: August 16, 2007