Unscheduled Rice scratch outage
September 4, 2019 4:40pm – September 5, 2019 11:20am
As of 11:20am, the Rice cluster has been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to firstname.lastname@example.org.
UPDATE: September 5, 2019 10:02am
Rice scratch filesystem has been repaired, and engineers are working on bringing the cluster back to service. We will provide an update by 12pm.
UPDATE: September 4, 2019 11:00pm
Engineers continue working with the vendor on rebuilding the array and bringing Rice scratch filesystem back to normal operation. We will provide another update by 10am tomorrow.
UPDATE: September 4, 2019 8:12pm
Work continues on bringing Rice back to normal operation. Engineers have identified the source of the problem (a failed drive in a redundant array) and are currently applying the fix. We will provide another update by 11pm.
ORIGINAL: September 4, 2019 5:24pm
The Rice cluster began experiencing issues with the scratch filesystem around 4:40pm. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.
We will provide an update by 8pm.