Outages
-
Halstead nodes continue to come back online. While the cluster is operating normally, the total amount of available nodes is not yet at full capacity. We will update on the situation by 6:00pm. Update: Scheduling has been restarted and jobs are cur...
-
The Data Depot file system was sporadically available for 2 hours today. Some jobs running on the Community Clusters paused during the instability but have resumed. We expect no job loss to have occurred. This issue is now resolved.
-
As of 2:35 pm, Conte cluster is returned to service. Scheduling is resumed in all queues. Update The source of the problem has been identified and the fix is underway. We anticipate returning Conte to service by 3pm today. Original message The Conte...
-
Scratch system failure on Rice, Snyder, Hammer
*** Update *** As of 7:00 pm, the problem on the scratch system has been corrected, and scheduling has resumed on all three affected clusters - Rice, Snyder, and Hammer. Update Storage engineers are working with the system vendor to evaluate a propos...
-
As of 8:48pm the issue has been resolved. Original message The Research Data Depot is experiencing a system-wide slow down. Engineers have isolated the systems which are at the core of this phenomenon and are taking steps to restore normal service....
-
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you encounter any lingering issues please let us know at rcac-help@purdue.edu Original Message Some core...
-
Nodes have continued to gradually reboot into the new image as jobs complete. At this point, more than 80% of Halstead has completed this process, and we have not seen any issues in them doing so. This outage is closed. Update: May 25, 2017 5:00pm...
-
Email notifications from Research Computing website broken
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing website are not working. Some people are receiving no email and others are receiving damaged emails. T...
-
Email to "rcac-help@purdue.edu" not Working
As of 3:45pm Friday, the rcac-help@purdue.edu address is working normally again. Original Message Beginning 5:00pm Thursday, the rcac-help@purdue.edu email address stopped accepting email. Anything sent since then has not been received. We are workin...
-
Unscheduled outages on portions of clusters
Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see any lingering issues at rcac-help@purdue.edu. UPDATE July 20, 2017 2:54pm Power has been restored to...
-
A failure has occurred in the systems which serve Data Depot to the various research clusters. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused on all systems while this issue is being add...
-
Unscheduled Outage in Math Data Center
At approximately 2:00pm EDT on Tuesday, September 5th, 2017, the Math building data center lost some power feeds which supply the Conte, Halstead, HalsteadGPU, Hathi, and Radon clusters. Scheduling on these has been paused until we can be sure power...
-
Access to Data Depot from the Halstead, HalsteadGPU, Hathi, Rice, Scholar, and Snyder clusters has hung starting around Thursday, September 7th, 2017 at 1:30pm EDT. Engineers are currently working to restore service to these systems. Job scheduling h...
-
Unscheduled scratch outage on Rice, Scholar and Snyder
The scratch filesystem serving Rice, Scholar, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until the filesystem is back online. Job scheduling on Rice, Scholar, and Snyder has be...
-
Unscheduled Scratch Outage on Rice, Snyder, Scholar
The scratch filesystem serving Rice, Scholar, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until the filesystem is back online. Job scheduling on Rice, Scholar, and Snyder has bee...
-
Update as of 5:00 PM the cluster is back in production. The Hathi cluster began experiencing various issues stemming from a recent kernel upgrade around 7:00am EST. Engineers are currently diagnosing the issue and are working to identify a fix. We wi...
-
The WSC Hadoop cluster began experiencing issues with login access around 10:30am EST. Engineers have identified the problem and are addressing it now. We expect to have service restored soon and will issue an update then.
-
The Fortress archive is unavailable due to a datacenter power issue. Datacenter facilities staff are currently investigating, however, at this time there is no estimate for a return to service.
-
Unscheduled Depot Outage on Compute Clusters
The servers providing access to Data Depot from Brown, Conte, Halstead, HalsteadGPU, Radon, Rice, Scholar, and Snyder suffered a partial failure. Many nodes in these clusters temporarily lost access to Depot. Jobs accessing files on Depot may have pa...
-
Unscheduled scratch outage on Rice and Scholar clusters
The scratch filesystem serving Rice and Scholar is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until the filesystem is back online. Job scheduling on Rice and Scholar has been paused while st...