Rice Transition to SLURM
February 18, 2020 8:00am – February 19, 2020 2:00pm
Work to transition Rice to SLURM has been completed. The cluster has been returned to production at this time.
UPDATE: February 18, 2020 5:59pm
Work is progressing well on upgrading Rice to Slurm! This will continue on into tomorrow as scheduled.
ORIGINAL: January 28, 2020 10:39am
Rice will be transitioning to a new batch scheduler on Tuesday, February 18th, 2020! This is a necessary upgrade and will require faculty and students to modify how they interact with the batch system. Please review the information below prior to this transition to ensure your work is not interrupted.
Rice will be switching from the PBS-based Torque/Moab scheduler to the newer SLURM scheduler. SLURM will offer additional features, reduce operating costs in the long-run, and is the leading scheduler amongst peer institutions.
Don’t wait though! You can test your scripts and try out SLURM today! You can already log in to a testing environment for you to explore SLURM and the impact it will have on your work. Access this testing cluster, by ThinLinc at https://desktop.mack.rcac.purdue.edu or by SSH to mack.rcac.purdue.edu. Please be aware this environment is small, so to ensure everyone can test their scripts and try out SLURM, do not try to run any serious work on the testing cluster.
We will also be hosting in-person SLURM transition training/help sessions every Friday during the transition from 2:00-3:30pm in the Envision Center, and are available the other four days a week from 2:00-3:00pm at various locations around campus for drop-ins at our Coffee Hour Consultations.
In order to transition, the Rice cluster will be down for a longer maintenance than normal, beginning on Tuesday, February 18th, 2020 at 8:00am. The cluster will not return to full production until Wednesday, February 19th, 2020 at 2:00pm. Any PBS jobs which request a walltime which would take them past Tuesday morning will not begin, and due to the batch scheduler change, any jobs in the queue which have not run before the maintenance must be deleted as part of the change.