Scheduling Paused on Multiple Clusters

July 21, 2021 4:00pm - 6:00pm EDT
Outages
Bell, Brown, Gilbreth, Halstead, Scholar

Link to update at July 21, 2021 5:38pm EDT UPDATE: July 21, 2021 5:38pm EDT

As of about 5:30 pm, this issue has been resolved. The schedulers are again starting jobs, and queue access lists have been re-synced.

Link to original posting ORIGINAL: July 21, 2021 4:00pm EDT

At about 4:00 pm today (Wednesday, 21 July, 2021) System Engineers found an issue with the schedulers on the Bell, Brown, Gilbreth, Halstead, and Scholar clusters.

Job scheduling has been paused while this is being investigated. Symptoms of this problem involve error messages being returned from common Slurm commands such as 'sinteractive' and 'slist' and users seeing loss of access to specific queues on the clusters.

Currently running jobs are not affected.

We are trouble-shooting now, and will have an update by 6:00 pm

Originally posted: July 21, 2021 4:28pm EDT

Scheduling Paused on Multiple Clusters

Link to update at July 21, 2021 5:38pm EDT UPDATE: July 21, 2021 5:38pm EDT

Link to original posting ORIGINAL: July 21, 2021 4:00pm EDT

Follow Us