Outages
-
Unscheduled data.rcac Transfer Node Outage
The data.rcac.purdue.edu data transfer node began experiencing issues and was taken down at 3:00pm EDT. Engineers are currently diagnosing the issue. Data may be transferred to/from other clusters using those clusters' login nodes, and for Data Depot...
-
Unscheduled Brown scratch outage
The Brown cluster began experiencing issues with its scratch filesystem around 12:30pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.
-
Unscheduled Brown scratch outage
The Brown cluster began experiencing issues with its scratch filesystem around 12:00pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will pr...
-
Unscheduled RCAC GitHub outage
The Research Computing GitHub service (github.rcac.purdue.edu) is currently down. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 12:00pm.
-
As of 12:30pm EDT all the clusters are back in production. If your job crashed during the outage, please resubmit it. We are currently experiencing an outage across the community clusters (Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, WC...
-
Central Authentication Service (CAS) Outage
This morning, BoilerKey authentication for all community clusters and user facing services (such as the RCAC website, Rstudio Server) is unavailable due to a Central Authentication Service (CAS) outage. All the clusters are under normal operations an...
-
The Fortress tape archive began experiencing issues with Error -1, Error -28, and No space left on device error messages in HSI and Globus around 9:00pm EDT on Tuesday, August 25th, 2020. Engineers are currently diagnosing the issue and are working...
-
The Fortress tape archive began experiencing issues with its disk cache subsystem on Thursday, August 27th, 2020 around 9:00pm EDT. The problems manifest themselves as intermittent Error -1, Error -28, and No space left on device error messages in HS...
-
The Fortress tape archive began experiencing issues with its disk cache subsystem being full on Tuesday, September 1st, 2020 around 12:30am EDT. The problems manifest themselves as intermittent Error -1, Error -28, and No space left on device error m...
-
Halstead's scratch began experiencing issues at approximately 2:00 AM. Some users have reported that they are unable to read or index files within their personal scratch directories when logged in from certain front-ends. Job scheduling has been pa...
-
Halstead's scratch began experiencing issues this morning (Sunday 27 Sep). Job scheduling has been paused while engineers and the system vendor investigate the issue. We will have an update by tomorrow morning (Monday 28 Sep) at 10:00 am.
-
The Halstead cluster began experiencing issues with its scratch filesystem around 9:00pm. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will prov...
-
The Halstead cluster began experiencing issues with its scratch filesystem around 1:15 pm, Sunday 11 Oct. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being address...
-
Unscheduled RCAC GitHub outage
The github.rcac server will be briefly unavailable Friday, October 16, 2020 from 7:00pm – 11:59pm for an emergency maintenance. During this time, the server will undergo maintenance tasks that can not be completed with the server in production. Opera...
-
The Brown cluster began experiencing issues with its job scheduler around 4:00pm EST. The problem manifests itself as Slurm-related commands (slist, squeue, sinteractive, sbatch, etc) being slow, unresponsive or timing out. Queue selection dialogs in...
-
Access to RCAC Resources During ITaP Central Authentication Outage
On Sunday, December 27th, 2020, ITaP staff will perform major upgrades to the central authentication infrastructure. All applications that require logging in with BoilerKey or Career Account credentials will be unavailable Sunday, December 27, 2020 f...
-
The Bell cluster began experiencing issues with metadata on its scratch filesystem around 9:00pm. The problem manifests itself as ls -l command hangs indefinitely, while the plain regular ls (or \ls, or stat FILE) appear to be working. Engineers are...
-
The Bell cluster began experiencing issues with its scratch filesystem around 5:00am EST. Engineers are currently diagnosing the issue and have opened the ticket with the vendor to identify a fix. Job scheduling has been paused while this issue is be...
-
The Bell cluster began experiencing issues with its scratch filesystem around 4:00pm EST. Engineers are currently diagnosing the issue and have opened a ticket with the vendor to identify a fix. Job scheduling has been paused while this issue is bein...
-
The Bell, Brown, Gilbreth, Halstead, Rice, Scholar, and Snyder clusters began experiencing issues with their Data Depot mounts around 10:00pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. To avoid job losses for...