Scratch unavailable on Snyder cluster

August 3, 2018  2:30pm – August 5, 2018  9:00am
Snyder

UPDATE: August 5, 2018  9:10am

As of 9:00am Sunday, the Snyder cluster has been returned to normal service. The scratch filesystem is stable and suffered no loss. Job queues have been enabled and job scheduling has been resumed. Please report any issues to rcac-help@purdue.edu.


UPDATE: August 4, 2018  5:55pm

Work continues on bringing Snyder back to normal operation. Engineers continue to work with the vendor to restore the filesystem after several hardware failures. New estimates are that this process will take at least another 14 hours. We will provide another update by 9:00 am, Sunday.


UPDATE: August 4, 2018  2:16pm

Snyder remains unavailable as engineers continue to check the scratch filesystem’s consistency following the repair of the underlying hardware issue.

We will provide another update by 7:00pm.


UPDATE: August 4, 2018  11:35am

Scratch remains unavailable as engineers continue to work with the storage vendor to check the scratch filesystem’s consistency following the repair of the underlying hardware issue.

We will provide another update by 2:00pm.


UPDATE: August 4, 2018  9:09am

The hardware issue on Snyder's scratch file system has been repaired and the file system is being checked. We will post another update around noon.


UPDATE: August 3, 2018  8:37pm

Snyder's scratch file system continues to experience issues and engineers are still working on the problem. We will provide another update either when the status changes or tomorrow morning around 9am.


UPDATE: August 3, 2018  6:16pm

Work continues on bringing Snyder's scratch system back to normal operation. We will provide another update by 8pm.


UPDATE: August 3, 2018  4:03pm

Work continues on bringing Snyder back to normal operation. Engineers are working with the storage vendor engineers to restore some failing hardware in the storage array. We will provide another update by 6:00pm.


ORIGINAL: August 3, 2018  3:07pm

As of approximately 2:30pm, the Snyder cluster is currently experiencing issues with its scratch filesystem. Engineers are currently diagnosing the issue and are working to identify a fix.

Attempts to access scratch will likely fail. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 4:00pm.

Originally posted: August 3, 2018  3:07pm