Unscheduled Halstead outage

June 23, 2022 8:00am - June 24, 2022 8:15am EDT
Outages
Halstead

Link to update at June 24, 2022 8:44am EDT UPDATE: June 24, 2022 8:44am EDT

As of 8:15am, the Halstead cluster has been returned to normal service. Scratch file system is fully operational, job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

Link to update at June 23, 2022 5:12pm EDT UPDATE: June 23, 2022 5:12pm EDT

Work continues on bringing Halstead back to normal operation. The scratch file system has been brought back online and is currently busy performing integrity verification and balancing/replication on the back end. These tasks are carried out with elevated priority, and response time for user operation may be significant.

Scheduling remains paused on Halstead. We appreciate your patience and will provide another update by noon tomorrow, June 24, or sooner based on the file system progress.

Link to original posting ORIGINAL: June 23, 2022 8:00am EDT

The Halstead cluster began experiencing issues with its scratch file system around 8:00am EDT. The problem manifests as various I/O errors or hangs when reading, writing or listing scratch directories.

Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 6pm tonight.

Originally posted: June 23, 2022 12:44pm EDT

Unscheduled Halstead outage

Link to update at June 24, 2022 8:44am EDT UPDATE: June 24, 2022 8:44am EDT

Link to update at June 23, 2022 5:12pm EDT UPDATE: June 23, 2022 5:12pm EDT

Link to original posting ORIGINAL: June 23, 2022 8:00am EDT

Follow Us