Article #1055: Unscheduled Depot Outage
Access to Data Depot from the Halstead, HalsteadGPU, Hathi, Rice, Scholar, and Snyder clusters has hung starting around Thursday, September 7th, 2017...
Access to Data Depot from the Halstead, HalsteadGPU, Hathi, Rice, Scholar, and Snyder clusters has hung starting around Thursday, September 7th, 2017...
At approximately 2:00pm EDT on Tuesday, September 5th, 2017, the Math building data center lost some power feeds which supply the Conte, Halstead, Hal...
A failure has occurred in the systems which serve Data Depot to the various research clusters. Engineers are currently diagnosing the issue and are wo...
Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see a...
As of 3:45pm Friday, the rcac-help@purdue.edu address is working normally again. Original Message Beginning 5:00pm Thursday, the rcac-help@purdue.edu...
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing we...
Nodes have continued to gradually reboot into the new image as jobs complete. At this point, more than 80% of Halstead has completed this process, an...
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you...
As of 8:48pm the issue has been resolved. Original message The Research Data Depot is experiencing a system-wide slow down. Engineers have isolated t...
*** Update *** As of 7:00 pm, the problem on the scratch system has been corrected, and scheduling has resumed on all three affected clusters - Rice,...
As of 2:35 pm, Conte cluster is returned to service. Scheduling is resumed in all queues. Update The source of the problem has been identified and the...
The Data Depot file system was sporadically available for 2 hours today. Some jobs running on the Community Clusters paused during the instability but...
Halstead nodes continue to come back online. While the cluster is operating normally, the total amount of available nodes is not yet at full capacity...
The Fortress archival storage system is currently experiencing intermittent connectivity. We expect the situation to be resolved by approximately 1pm....
The scratch filesystems serving Carter, Hammer, Rice, Scholar, and Snyder started behaving abnormally this morning. This may have affected some jobs,...
The Research Data Depot has been restored to service. A portion of the systems serving the Research Data Depot have suffered a failure. Some systems u...
The scratch filesystem serving Hammer, Rice, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch...
Following the security updates on Halstead, an issue was discovered that prevented multi-node MPI jobs from running properly. Scheduling on Halstead h...
The scratch filesystem serving Conte is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until th...
System monitoring has revealed intermittent issues connecting to the Research Data Depot on Thursday January 19. When this issue occurs, users will ex...