Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Unscheduled Brown outage

  • Outages
  • Brown

Link to update at August 11, 2022 1:21pm EDT UPDATE:

As of 1:21pm EDT, the Brown cluster has been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service and greatly appreciate your patience during this time. Please report any issues to rcac-help@purdue.edu.

Link to update at August 11, 2022 11:44am EDT UPDATE:

Storage pools verification process completed successfully. Engineers are working on starting up the Lustre file system and bringing Brown back to normal operation. We will provide another update by 6pm or as soon as we have new information.

Link to update at August 10, 2022 4:54pm EDT UPDATE:

Replacement storage controllers hardware for Brown scratch are fully operational. As of last night, the system was undergoing an automated storage pools verification procedure (a necessary safety precaution after this type of controller hardware failures). At the time of this writing the process is at approximately 75% completion state.

Once the underlying storage pools are successfully verified, engineers will proceed with bringing the filesystem back online and (barring any negative diagnostics) returning the cluster to normal service. We will provide another update by noon tomorrow (Thursday, August 11th), or as soon as we have new information.

Link to update at August 9, 2022 7:31pm EDT UPDATE:

Replacement hardware for Brown scratch has arrived and has been deployed. Engineers are performing validation and configuration to ensure the replacement and the filesystem are functioning as expected.

We do not have an ETA at the moment, and will provide next update by 6pm tomorrow (Wednesday, August 10th) or as soon as we have any new information. We greatly appreciate your patience.

Link to update at August 8, 2022 7:02pm EDT UPDATE:

Replacement hardware for the Brown scratch is expected to arrive on Tuesday, August 9.

We will provide an update by 6pm on Tuesday, August 9 or sooner as we have more information.

Link to update at August 7, 2022 8:27pm EDT UPDATE:

We currently expect vendor-provided replacement hardware for the Brown scratch to be delivered on Tuesday. The scratch filesystem remains down, and scheduling remains paused while engineers evaluate available options.

We appreciate your patience. We will provide an update by 6pm on Monday, August 8 or sooner as we have more information.

Link to update at August 7, 2022 8:41am EDT UPDATE:

The vendor diagnosed defects in filesystem controllers and is shipping replacement hardware.

Brown scratch filesystem remains unavailable. We will provide next update by 9pm.

Link to update at August 6, 2022 5:59pm EDT UPDATE:

Work continues on troubleshooting the issue and bringing Brown back to normal operation. Engineers are working with the vendor to identify the source of the problem with the scratch file system.
Brown scratch remains unavailable, and any job that relies on scratch will not be able to perform I/O operations there.

We will provide another update by noon tomorrow, August 7.

Link to original posting ORIGINAL:

The Brown cluster began experiencing issues with its scratch filesystem around 11:20am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 6pm.

Originally posted: