Halstead and Brown unscheduled outageUPDATE: February 12, 2019 4:23pm
As of 4:00 pm, the Halstead and HalsteadGPU scratch system and cluster has been returned to normal service. Job queues have been enabled and job scheduling has been resumed. Please report any issues to email@example.com.
UPDATE: February 12, 2019 10:51am
Brown and BrownGPU scratch has been returned to normal service. Job scheduling has been restarted, so Brown and BrownGPU are back to full production. Please let us know if you see any lingering issues at firstname.lastname@example.org.
Storage engineers and the vendor continue to work on bringing Halstead/HalsteadGPU scratch back to service. We will provide another update on Halstead by 2 pm today.
UPDATE: February 11, 2019 4:44pm
Both Halstead and Brown scratch filesystems (shared by their respective GPU system too) suffered damage due to a power spike during the power outage earlier today. Storage engineers and engineers from the vendor are continuing to work on it into this evening.
Job scheduling remains paused. Scratch purges are also canceled this week for Brown and Halstead scratches.
We will provide another update by 10:00 am tomorrow morning.
ORIGINAL: February 11, 2019 1:24pm
Halstead, HalsteadGPU, Brown, and BrownGPU went offline during a campus power event around 8:40 am this morning. Engineers are working to bring the compute nodes and the scratch system back online. Other systems are back online at this time. Job scheduling is paused at the moment.
We will provide an update by 5 pm this afternoon.