Unscheduled Home Directory Outage

May 8, 2020 2:30pm - May 9, 2020 9:00pm EDT
Outages
Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, Workbench

Link to update at May 9, 2020 9:03pm EDT UPDATE: May 9, 2020 9:03pm EDT

As of %endtime%, the Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, and Workbench clusters have been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

Link to update at May 9, 2020 3:05pm EDT UPDATE: May 9, 2020 3:05pm EDT

As the impact of this grows with the length of the scheduling pause, we're now emailing all users directly. Job scheduling on all clusters remains paused to reduce load and help engineers locate the issue. Work continues on bringing the /home filesystem back to normal operation, and we will provide another update later tonight or as soon as the situation changes.

Link to update at May 8, 2020 8:50pm EDT UPDATE: May 8, 2020 8:50pm EDT

Work continues on troubleshooting source of high load on Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, and Workbench clusters' home directories. Scheduling of new jobs has been temporarily paused.

Link to original posting ORIGINAL: May 8, 2020 2:30pm EDT

The Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, and Workbench clusters began experiencing issues with intermittently slow home directories access around 2:30pm EDT. The issue has been traced to a high load on one of the filesystem's back-end servers. Engineers are currently diagnosing the issue and are working to identify a fix.

Originally posted: May 8, 2020 3:46pm EDT
Last updated: May 9, 2020 9:03pm EDT

Unscheduled Home Directory Outage

Link to update at May 9, 2020 9:03pm EDT UPDATE: May 9, 2020 9:03pm EDT

Link to update at May 9, 2020 3:05pm EDT UPDATE: May 9, 2020 3:05pm EDT

Link to update at May 8, 2020 8:50pm EDT UPDATE: May 8, 2020 8:50pm EDT

Link to original posting ORIGINAL: May 8, 2020 2:30pm EDT

Follow Us