Unscheduled outages on portions of clusters
Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see any lingering issues at email@example.com.
UPDATE July 20, 2017 2:54pm
Power has been restored to the affected portions of the MATH datacenter. Engineers are currently in the process of switching compute nodes back online and verifying the stability of the clusters.
We will update again by 6 pm this evening.
HalsteadGPU (including front-ends), Hammer-B compute nodes, most Halstead nodes, and approximately half of Conte compute nodes lost power around 12:30pm. Engineers and datacenter personnel are currently diagnosing the issue and are working to bring power back. Any jobs running on the affected compute nodes have been lost. Job scheduling has been paused while this issue is being addressed.
We will provide an update no later than 3 pm.