Unscheduled Rice scratch outage

September 4, 2019  4:40pm – September 5, 2019  11:20am

UPDATE: September 5, 2019  11:20am

As of 11:20am, the Rice cluster has been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

UPDATE: September 5, 2019  10:02am

Rice scratch filesystem has been repaired, and engineers are working on bringing the cluster back to service. We will provide an update by 12pm.

UPDATE: September 4, 2019  11:00pm

Engineers continue working with the vendor on rebuilding the array and bringing Rice scratch filesystem back to normal operation. We will provide another update by 10am tomorrow.

UPDATE: September 4, 2019  8:12pm

Work continues on bringing Rice back to normal operation. Engineers have identified the source of the problem (a failed drive in a redundant array) and are currently applying the fix. We will provide another update by 11pm.

ORIGINAL: September 4, 2019  5:24pm

The Rice cluster began experiencing issues with the scratch filesystem around 4:40pm. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 8pm.

Originally posted: September 4, 2019  5:24pm