Unscheduled Home Directory Outage

UPDATE: May 9, 2020  9:03pm

As of %endtime%, the Gilbreth, Brown, Snyder, Halstead, Rice, Scholar, and Workbench clusters have been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

UPDATE: May 9, 2020  3:05pm

As the impact of this grows with the length of the scheduling pause, we're now emailing all users directly. Job scheduling on all clusters remains paused to reduce load and help engineers locate the issue. Work continues on bringing the /home filesystem back to normal operation, and we will provide another update later tonight or as soon as the situation changes.

UPDATE: May 8, 2020  8:50pm

Work continues on troubleshooting source of high load on Gilbreth, Brown, Snyder, Halstead, Rice, Scholar, and Workbench clusters' home directories. Scheduling of new jobs has been temporarily paused.

ORIGINAL: May 8, 2020 2:30pm - May 9, 2020 9:00pm EDT

The Gilbreth, Brown, Snyder, Halstead, Rice, Scholar, and Workbench clusters began experiencing issues with intermittently slow home directories access around 2:30pm. The issue has been traced to a high load on one of the filesystem's back-end servers. Engineers are currently diagnosing the issue and are working to identify a fix.

Originally posted: May 8, 2020 3:46pm EDT