Unscheduled Home Directory Outage

May 8, 2020  2:30pm – May 9, 2020  9:00pm
Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, Workbench

UPDATE: May 9, 2020  9:03pm

As of 9:00pm, the Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, and Workbench clusters have been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.


UPDATE: May 9, 2020  3:05pm

As the impact of this grows with the length of the scheduling pause, we're now emailing all users directly. Job scheduling on all clusters remains paused to reduce load and help engineers locate the issue. Work continues on bringing the /home filesystem back to normal operation, and we will provide another update later tonight or as soon as the situation changes.


UPDATE: May 8, 2020  8:50pm

Work continues on troubleshooting source of high load on Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, and Workbench clusters' home directories. Scheduling of new jobs has been temporarily paused.


ORIGINAL: May 8, 2020  3:46pm

The Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, and Workbench clusters began experiencing issues with intermittently slow home directories access around 2:30pm. The issue has been traced to a high load on one of the filesystem's back-end servers. Engineers are currently diagnosing the issue and are working to identify a fix.

Originally posted: May 8, 2020  3:46pm