Skip to main content

Unscheduled Halstead outage

  • Outages
  • Halstead

UPDATE:

As of 8:15am, the Halstead cluster has been returned to normal service. Scratch file system is fully operational, job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

UPDATE:

Work continues on bringing Halstead back to normal operation. The scratch file system has been brought back online and is currently busy performing integrity verification and balancing/replication on the back end. These tasks are carried out with elevated priority, and response time for user operation may be significant.

Scheduling remains paused on Halstead. We appreciate your patience and will provide another update by noon tomorrow, June 24, or sooner based on the file system progress.

ORIGINAL:

The Halstead cluster began experiencing issues with its scratch file system around 8:00am EDT. The problem manifests as various I/O errors or hangs when reading, writing or listing scratch directories.

Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 6pm tonight.

Originally posted: