Scheduling Paused on Multiple Clusters
UPDATE: July 21, 2021 5:38pm
As of about 5:30 pm, this issue has been resolved. The schedulers are again starting jobs, and queue access lists have been re-synced.
ORIGINAL: July 21, 2021 4:00pm - 6:00pm EDT
At about 4:00 pm today (Wednesday, 21 July, 2021) System Engineers found an issue with the schedulers on the Bell, Brown, Gilbreth, Halstead, and Scholar clusters.
Job scheduling has been paused while this is being investigated. Symptoms of this problem involve error messages being returned from common Slurm commands such as 'sinteractive' and 'slist' and users seeing loss of access to specific queues on the clusters.
Currently running jobs are not affected.
We are trouble-shooting now, and will have an update by 6:00 pm