Article #959: Scheduler Issue on Halstead
Halstead nodes continue to come back online. While the cluster is operating normally, the total amount of available nodes is not yet at full capacity...
Halstead nodes continue to come back online. While the cluster is operating normally, the total amount of available nodes is not yet at full capacity...
The Data Depot file system was sporadically available for 2 hours today. Some jobs running on the Community Clusters paused during the instability but...
As of 2:35 pm, Conte cluster is returned to service. Scheduling is resumed in all queues. Update The source of the problem has been identified and the...
*** Update *** As of 7:00 pm, the problem on the scratch system has been corrected, and scheduling has resumed on all three affected clusters - Rice,...
As of 8:48pm the issue has been resolved. Original message The Research Data Depot is experiencing a system-wide slow down. Engineers have isolated t...
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you...
Nodes have continued to gradually reboot into the new image as jobs complete. At this point, more than 80% of Halstead has completed this process, an...
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing we...
As of 3:45pm Friday, the rcac-help@purdue.edu address is working normally again. Original Message Beginning 5:00pm Thursday, the rcac-help@purdue.edu...
Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see a...
A failure has occurred in the systems which serve Data Depot to the various research clusters. Engineers are currently diagnosing the issue and are wo...
At approximately 2:00pm EDT on Tuesday, September 5th, 2017, the Math building data center lost some power feeds which supply the Conte, Halstead, Hal...
Access to Data Depot from the Halstead, HalsteadGPU, Hathi, Rice, Scholar, and Snyder clusters has hung starting around Thursday, September 7th, 2017...
The scratch filesystem serving Rice, Scholar, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch...
The scratch filesystem serving Rice, Scholar, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch...
Update as of 5:00 PM the cluster is back in production. The Hathi cluster began experiencing various issues stemming from a recent kernel upgrade arou...
The WSC Hadoop cluster began experiencing issues with login access around 10:30am EST. Engineers have identified the problem and are addressing it now...
The Fortress archive is unavailable due to a datacenter power issue. Datacenter facilities staff are currently investigating, however, at this time th...
The servers providing access to Data Depot from Brown, Conte, Halstead, HalsteadGPU, Radon, Rice, Scholar, and Snyder suffered a partial failure. Many...
The scratch filesystem serving Rice and Scholar is currently unavailable. Both currently running jobs and attempts to access files in scratch will blo...