Unscheduled Cluster Outage
Link to update at September 10, 2023 7:58pm EDT UPDATE:
At about 7:45, engineers completed their work on Negishi, and it has been brought back to full production, and the scheduler restarted.
Link to update at September 10, 2023 5:06pm EDT UPDATE:
As of 5:00pm, work is continuing on the Negishi cluster, and scheduling is still paused. We will provide a further update by 8:00pm.
Link to update at September 10, 2023 4:03pm EDT UPDATE:
As of 3:45pm, the Bell cluster has returned to production status. Scheduling is still paused on the Negishi cluster, and we will have an update by 5:00pm EDT
Link to original posting ORIGINAL:
Update: As of 3:45pm, the Bell cluster has returned to production status. Scheduling is still paused on the Negishi cluster, and we will have an update by 5:00pm EDT
The Bell and Negishi clusters began experiencing issues with power around 1:00pm EDT. Engineers are currently at the data center and working to identify a fix. Job scheduling has been paused while this issue is being addressed.
We will provide an update by 5:00pm EDT