Unscheduled Home Directory Outage
UPDATE: May 9, 2020 9:03pm
As of %endtime%, the Gilbreth, Brown, Snyder, Halstead, Rice, Scholar, and Workbench clusters have been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to firstname.lastname@example.org.
UPDATE: May 9, 2020 3:05pm
As the impact of this grows with the length of the scheduling pause, we're now emailing all users directly. Job scheduling on all clusters remains paused to reduce load and help engineers locate the issue. Work continues on bringing the
/home filesystem back to normal operation, and we will provide another update later tonight or as soon as the situation changes.
UPDATE: May 8, 2020 8:50pm
Work continues on troubleshooting source of high load on Gilbreth, Brown, Snyder, Halstead, Rice, Scholar, and Workbench clusters' home directories. Scheduling of new jobs has been temporarily paused.
ORIGINAL: May 8, 2020 2:30pm - May 9, 2020 9:00pm EDT
The Gilbreth, Brown, Snyder, Halstead, Rice, Scholar, and Workbench clusters began experiencing issues with intermittently slow home directories access around 2:30pm. The issue has been traced to a high load on one of the filesystem's back-end servers. Engineers are currently diagnosing the issue and are working to identify a fix.