Unscheduled outages on portions of clusters

July 20, 2017 12:30pm - 5:15pm
Outages and Maintenance
Hammer, HalsteadGPU, Conte, Halstead

Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see any lingering issues at rcac-help@purdue.edu.

UPDATE July 20, 2017 2:54pm

Power has been restored to the affected portions of the MATH datacenter. Engineers are currently in the process of switching compute nodes back online and verifying the stability of the clusters.

We will update again by 6 pm this evening.

ORIGINAL MESSAGE

HalsteadGPU (including front-ends), Hammer-B compute nodes, most Halstead nodes, and approximately half of Conte compute nodes lost power around 12:30pm. Engineers and datacenter personnel are currently diagnosing the issue and are working to bring power back. Any jobs running on the affected compute nodes have been lost. Job scheduling has been paused while this issue is being addressed.

We will provide an update no later than 3 pm.

Originally posted: July 20, 2017 12:52pm