Unscheduled Gilbreth outage

January 3, 2020 11:30am - January 5, 2020 8:35pm EST
Outages
Gilbreth

Link to update at January 5, 2020 8:35pm EST UPDATE: January 5, 2020 8:35pm EST

As of 8:35pm EST, the Gilbreth cluster has been returned to normal service. Job queues have been enabled and job scheduling has been resumed. For a limited time, scratch performance may be somewhat degraded while the file system continues to recover from the failure.

We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

Link to update at January 4, 2020 8:10pm EST UPDATE: January 4, 2020 8:10pm EST

Storage engineers have replaced malfunctioning scratch controller hardware, work continues on system verification.

We will provide another update as verification progresses.

Link to update at January 3, 2020 1:57pm EST UPDATE: January 3, 2020 1:57pm EST

Engineers are engaged with the vendor and work continues on troubleshooting Gilbreth unresponsive scratch issues. The problem appears to be a hardware failure and is likely to extend into the weekend. We do not expect any data loss at this time and will be watching for its safety as always.

We will provide another update as more information becomes available.

Link to original posting ORIGINAL: January 3, 2020 11:30am EST

The Gilbreth cluster began experiencing issues with its scratch filesystem around 11:30am EST. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 2pm.

Originally posted: January 3, 2020 11:42am EST
Last updated: January 5, 2020 8:35pm EST

Unscheduled Gilbreth outage

Link to update at January 5, 2020 8:35pm EST UPDATE: January 5, 2020 8:35pm EST

Link to update at January 4, 2020 8:10pm EST UPDATE: January 4, 2020 8:10pm EST

Link to update at January 3, 2020 1:57pm EST UPDATE: January 3, 2020 1:57pm EST

Link to original posting ORIGINAL: January 3, 2020 11:30am EST

Follow Us