Unscheduled Rice scratch outage

September 4, 2019 4:40pm - September 5, 2019 11:20am EDT
Outages
Rice

Link to update at September 5, 2019 11:20am EDT UPDATE: September 5, 2019 11:20am EDT

As of 11:20am EDT, the Rice cluster has been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

Link to update at September 5, 2019 10:02am EDT UPDATE: September 5, 2019 10:02am EDT

Rice scratch filesystem has been repaired, and engineers are working on bringing the cluster back to service. We will provide an update by 12pm.

Link to update at September 4, 2019 11:00pm EDT UPDATE: September 4, 2019 11:00pm EDT

Engineers continue working with the vendor on rebuilding the array and bringing Rice scratch filesystem back to normal operation. We will provide another update by 10am tomorrow.

Link to update at September 4, 2019 8:12pm EDT UPDATE: September 4, 2019 8:12pm EDT

Work continues on bringing Rice back to normal operation. Engineers have identified the source of the problem (a failed drive in a redundant array) and are currently applying the fix. We will provide another update by 11pm.

Link to original posting ORIGINAL: September 4, 2019 4:40pm EDT

The Rice cluster began experiencing issues with the scratch filesystem around 4:40pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 8pm.

Originally posted: September 4, 2019 5:24pm EDT
Last updated: September 5, 2019 11:20am EDT