Article #1080: Unscheduled WSC Outage
The WSC Hadoop cluster began experiencing issues with login access around 10:30am EST. Engineers have identified the problem and are addressing it now...
The WSC Hadoop cluster began experiencing issues with login access around 10:30am EST. Engineers have identified the problem and are addressing it now...
Update as of 5:00 PM the cluster is back in production. The Hathi cluster began experiencing various issues stemming from a recent kernel upgrade arou...
The scratch filesystem serving Rice, Scholar, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch...
The scratch filesystem serving Rice, Scholar, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch...
Access to Data Depot from the Halstead, HalsteadGPU, Hathi, Rice, Scholar, and Snyder clusters has hung starting around Thursday, September 7th, 2017...
At approximately 2:00pm EDT on Tuesday, September 5th, 2017, the Math building data center lost some power feeds which supply the Conte, Halstead, Hal...
A failure has occurred in the systems which serve Data Depot to the various research clusters. Engineers are currently diagnosing the issue and are wo...
Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see a...
As of 3:45pm Friday, the rcac-help@purdue.edu address is working normally again. Original Message Beginning 5:00pm Thursday, the rcac-help@purdue.edu...
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing we...
Nodes have continued to gradually reboot into the new image as jobs complete. At this point, more than 80% of Halstead has completed this process, an...
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you...
As of 8:48pm the issue has been resolved. Original message The Research Data Depot is experiencing a system-wide slow down. Engineers have isolated t...
*** Update *** As of 7:00 pm, the problem on the scratch system has been corrected, and scheduling has resumed on all three affected clusters - Rice,...
As of 2:35 pm, Conte cluster is returned to service. Scheduling is resumed in all queues. Update The source of the problem has been identified and the...
The Data Depot file system was sporadically available for 2 hours today. Some jobs running on the Community Clusters paused during the instability but...
Halstead nodes continue to come back online. While the cluster is operating normally, the total amount of available nodes is not yet at full capacity...
The Fortress archival storage system is currently experiencing intermittent connectivity. We expect the situation to be resolved by approximately 1pm....
The scratch filesystems serving Carter, Hammer, Rice, Scholar, and Snyder started behaving abnormally this morning. This may have affected some jobs,...
The Research Data Depot has been restored to service. A portion of the systems serving the Research Data Depot have suffered a failure. Some systems u...