Unscheduled Halstead outage

October 8, 2020  9:00pm – October 10, 2020  8:00am
Halstead

UPDATE: October 10, 2020  8:17am

As of 8:00am, Halstead's scratch system has been returned to service and scheduling of new jobs has been enabled.

Thank you for your patience.


UPDATE: October 9, 2020  3:59pm

Work continues on bringing Halstead scratch back to normal operation. Engineers have identified the source of the problem. The filesystem is up and undergoes internal checks and disk pools verification. Job scheduling on Halstead remains paused.

We will provide another update by 10 am tomorrow.


UPDATE: October 9, 2020  9:55am

Work continues on bringing Halstead scratch back to normal operation. Engineers are working with the vendor to identify the source of the problem. We will provide another update by 4pm today.


ORIGINAL: October 8, 2020  10:57pm

The Halstead cluster began experiencing issues with its scratch filesystem around 9:00pm. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 10 am tomorrow.

Originally posted: October 8, 2020  10:57pm