Unscheduled outages on portions of clusters

July 20, 2017  12:30pm – 5:15pm
Conte, Halstead, HalsteadGPU, Hammer

Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see any lingering issues at rcac-help@purdue.edu.

UPDATE July 20, 2017 2:54pm

Power has been restored to the affected portions of the MATH datacenter. Engineers are currently in the process of switching compute nodes back online and verifying the stability of the clusters.

We will update again by 6 pm this evening.

ORIGINAL MESSAGE

HalsteadGPU (including front-ends), Hammer-B compute nodes, most Halstead nodes, and approximately half of Conte compute nodes lost power around 12:30pm. Engineers and datacenter personnel are currently diagnosing the issue and are working to bring power back. Any jobs running on the affected compute nodes have been lost. Job scheduling has been paused while this issue is being addressed.

We will provide an update no later than 3 pm.

Originally posted: July 20, 2017  12:52pm

Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, (765) 494-4600

© 2017 Purdue University | An equal access/equal opportunity university | Copyright Complaints | Maintained by ITaP Research Computing

Trouble with this page? Disability-related accessibility issue? Please contact us at online@purdue.edu so we can help.