Outages
-
Unscheduled Conte scratch outage
The Conte cluster began experiencing issues with the scratch filesystem around 10:45am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will pr...
-
Job Scheduling Issue on Clusters
As of Monday, April 16th, 2018 at 10:00am EDT, Halstead, HalsteadGPU, and Hammer are not properly scheduling new jobs due to a problem with the Moab scheduler. Existing jobs are unaffected. We are working with the vendor to address this and expect...
-
All Research Computing systems suffered an unplanned outage Saturday, March 24th, 2018 at 8:15pm EDT due to a widespread power failure in the area. Thanks to diligent efforts all night and today by many teams across ITaP, all computational clusters h...
-
Depot Access Issues from ECN Systems
Update Working closely with ECN, RCAC engineers have deployed a new CIFS server to mitigate any incompatibilities. ECN users affected by this issue should connect to their Depot space though \\[datadepot2.rcac.purdue.edu](http://datadepot2.rcac.purdu...
-
Unscheduled scratch outage on Rice and Scholar clusters
The scratch filesystem serving Rice and Scholar is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until the filesystem is back online. Job scheduling on Rice and Scholar has been paused while st...
-
Unscheduled Depot Outage on Compute Clusters
The servers providing access to Data Depot from Brown, Conte, Halstead, HalsteadGPU, Radon, Rice, Scholar, and Snyder suffered a partial failure. Many nodes in these clusters temporarily lost access to Depot. Jobs accessing files on Depot may have pa...
-
The Fortress archive is unavailable due to a datacenter power issue. Datacenter facilities staff are currently investigating, however, at this time there is no estimate for a return to service.
-
The WSC Hadoop cluster began experiencing issues with login access around 10:30am EST. Engineers have identified the problem and are addressing it now. We expect to have service restored soon and will issue an update then.
-
Update as of 5:00 PM the cluster is back in production. The Hathi cluster began experiencing various issues stemming from a recent kernel upgrade around 7:00am EST. Engineers are currently diagnosing the issue and are working to identify a fix. We wi...
-
Unscheduled Scratch Outage on Rice, Snyder, Scholar
The scratch filesystem serving Rice, Scholar, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until the filesystem is back online. Job scheduling on Rice, Scholar, and Snyder has bee...
-
Unscheduled scratch outage on Rice, Scholar and Snyder
The scratch filesystem serving Rice, Scholar, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until the filesystem is back online. Job scheduling on Rice, Scholar, and Snyder has be...
-
Access to Data Depot from the Halstead, HalsteadGPU, Hathi, Rice, Scholar, and Snyder clusters has hung starting around Thursday, September 7th, 2017 at 1:30pm EDT. Engineers are currently working to restore service to these systems. Job scheduling h...
-
Unscheduled Outage in Math Data Center
At approximately 2:00pm EDT on Tuesday, September 5th, 2017, the Math building data center lost some power feeds which supply the Conte, Halstead, HalsteadGPU, Hathi, and Radon clusters. Scheduling on these has been paused until we can be sure power...
-
A failure has occurred in the systems which serve Data Depot to the various research clusters. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused on all systems while this issue is being add...
-
Unscheduled outages on portions of clusters
Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see any lingering issues at rcac-help@purdue.edu. UPDATE July 20, 2017 2:54pm Power has been restored to...
-
Email to "rcac-help@purdue.edu" not Working
As of 3:45pm Friday, the rcac-help@purdue.edu address is working normally again. Original Message Beginning 5:00pm Thursday, the rcac-help@purdue.edu email address stopped accepting email. Anything sent since then has not been received. We are workin...
-
Email notifications from Research Computing website broken
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing website are not working. Some people are receiving no email and others are receiving damaged emails. T...
-
Nodes have continued to gradually reboot into the new image as jobs complete. At this point, more than 80% of Halstead has completed this process, and we have not seen any issues in them doing so. This outage is closed. Update: May 25, 2017 5:00pm...
-
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you encounter any lingering issues please let us know at rcac-help@purdue.edu Original Message Some core...
-
As of 8:48pm the issue has been resolved. Original message The Research Data Depot is experiencing a system-wide slow down. Engineers have isolated the systems which are at the core of this phenomenon and are taking steps to restore normal service....