Outages
-
Central Authentication Service (CAS) Outage
This morning, BoilerKey authentication for all community clusters and user facing services (such as the RCAC website, Rstudio Server) is unavailable due to a Central Authentication Service (CAS) outage. All the clusters are under normal operations an...
-
As of 12:30pm EDT all the clusters are back in production. If your job crashed during the outage, please resubmit it. We are currently experiencing an outage across the community clusters (Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, WC...
-
Unscheduled RCAC GitHub outage
The Research Computing GitHub service (github.rcac.purdue.edu) is currently down. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 12:00pm.
-
Unscheduled Brown scratch outage
The Brown cluster began experiencing issues with its scratch filesystem around 12:00pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will pr...
-
Unscheduled Brown scratch outage
The Brown cluster began experiencing issues with its scratch filesystem around 12:30pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.
-
Unscheduled data.rcac Transfer Node Outage
The data.rcac.purdue.edu data transfer node began experiencing issues and was taken down at 3:00pm EDT. Engineers are currently diagnosing the issue. Data may be transferred to/from other clusters using those clusters' login nodes, and for Data Depot...
-
Unscheduled Home Directory Outage
The Brown, Gilbreth, Halstead, Rice, Scholar, Snyder, and Workbench clusters began experiencing issues with intermittently slow home directories access around 2:30pm EDT. The issue has been traced to a high load on one of the filesystem's back-end se...
-
Unscheduled Brown scratch outage
The Brown cluster began experiencing issues with its scratch filesystem around 4:30pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will pro...
-
The Data Depot storage system began experiencing issues with No space left on device error messages around 10:30am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling on community clusters has been paus...
-
The Weber cluster began experiencing issues around 10:00am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 2:00pm.
-
Engineering Computing Network (ECN) has reported an outage on the software license servers for ITaP Research Computing systems that are hosted by ECN. ITaP Research Computing cluster job scheduling is not affected by the outage, but licenses for soft...
-
Unscheduled Data Depot windows network drive outage
Since Friday, April 17, the Research Data Depot filesystem has been unavailable on community cluster systems, but remained available through other means of access (such as Windows Network Drive). Around 9:00am EDT on Tuesday, April 21st, 2020, the D...
-
Running Jobs on Community Clusters While Data Depot is Unavailable
Since Friday, April 17, the Research Data Depot filesystem has been unavailable on community cluster systems due to an ongoing filesystem verification. While we don't believe there is any danger of data loss, the filesystem verification will continu...
-
Unscheduled Data Depot outage on the clusters
The Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench clusters began experiencing issues with connection to Data Depot filesystem around 5:00pm EDT on Friday, April 17th, 2020. Engineers are currently diagnosing the issue and ar...
-
Unscheduled Outage to Multiple Systems
Hammer, Scholar, Snyder, WCERES, WSC Hadoop, and Data Depot began experiencing issues with networking around 10:00am EST. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 1:00pm if the issue...
-
The Gilbreth cluster began experiencing issues with its scratch filesystem around 11:30am EST. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will...
-
The Fortress Archive began experiencing issues with an internal database around 4:30pm. Engineers are currently working to remove the affected database from the system to mitigate the issue. Fortress should return to normal function once this is remo...
-
Data Depot suffered a system failure around 5:15pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will provide an update by 9pm or earlier as...
-
Unscheduled Rice scratch outage
The Rice cluster began experiencing issues with the scratch filesystem around 4:40pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will prov...
-
Unscheduled power outage affecting Brown and Hammer
The Brown and Hammer clusters experienced a partial power outage overnight which caused them to operate at a reduced capacity. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this...