Outages
-
MATH data center cooling outage
The Math building data center began experience issues with its cooling system around 1:40pm EDT. To minimize thermal load on the cooling infrastructure, job scheduling has been paused and all idle compute nodes on Anvil, Bell, Geddes, Gilbreth, and...
-
The Brown cluster began experiencing issues with its scratch filesystem around 11:20am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will pr...
-
A section of the Bell cluster compute nodes began experiencing issues with power feed and cooling around 2:30pm EDT. Engineers have powered down affected nodes and are working to identify a fix. Some jobs may have ended up terminated or requeued. Jo...
-
The Bell cluster continues to experience issues with Hardware. Engineers are currently diagnosing the issues and are working with vendors to schedule and perform repairs as quickly as possible. Job scheduling continues, but you may experience longer...
-
The Bell Gateway began experiencing issues following the Oct 4-6th Bell Maintenance. In particular, gateway applications have been observed to fail upon attempting to connect to the application after launching the job. Our engineers are investigating...
-
Unscheduled Scholar Gateway outage
The Scholar cluster began experiencing issues with its OnDemand Gateway around Sunday, October 16th, 2022 at 9:00pm EDT. The issue manifests as connection to gateway.scholar.purdue.edu timing out. Engineers are currently diagnosing the issue with the...
-
Scheduling Paused on Brown, Gilbreth, Halstead, and Hammer
As of 11:30am EDT, the Brown, Gilbreth, Halstead, and Hammer clusters began experiencing issues with their filesystems which may cause login failures. Engineers are currently investigating the root cause, and in the interim, job scheduling has been p...
-
Scheduling paused on multiple clusters
The Bell, Brown, Gilbreth, Halstead, and Scholar clusters began experiencing issues with their Data Depot mounts around 9:50am EST. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while...
-
Data Depot Degraded Performance
The data depot began experiencing issues around 9:50am EST. While engineers work to diagnose and fix this issue, users may notice degraded performance in the form of sluggish I/O operations performed on Data Depot. This may also cause slow logins for...
-
The Bell cluster began experiencing issues with its Lustre scratch filesystem around 12:30pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We w...
-
The Anvil cluster began experiencing issues with its scratch and project file system around 10:00am EST. Access to scratch and project directories may be slow or hang. Engineers are currently diagnosing the issue and are working with the vendor to id...
-
The Bell cluster began experiencing issues with its scratch filesystem around 12:55pm EST. File access operations (e.g. ls) may appear hanging. Logins to the Open OnDemand gateway ( gateway.bell.rcac.purdue.edu) may appear sluggish or hanging as wel...
-
The Bell cluster began experiencing issues with its scratch filesystem around 7:50pm EDT. File access operations (e.g. ls) may appear hanging. Logins to the Open OnDemand gateway (gateway.bell.rcac.purdue.edu) may appear sluggish or hanging as well....
-
The Gilbreth cluster began experiencing issues with its scheduler spool filesystem around 10:30pm EDT on Saturday, March 18th, 2023. The problem manifests as an I/O error during new batch job submissions and in Open OnDemand gateway applications. Int...
-
Unscheduled Scholar partial outage
The Scholar cluster began experiencing issues with its Thinlinc remote desktop (desktop.scholar.rcac.purdue.edu) and its RStudio Server (rstudio.scholar.rcac.purdue.edu) around 8:30am EDT. Engineers are currently diagnosing the issue and are working...
-
The Anvil cluster began experiencing issues with its scratch filesystem around 6:45pm EDT. Access to scratch directories may be slow or hang. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paus...
-
Around 2:10p EST, the Brown cluster began experiencing issues with home directory mounts. Job scheduling on the Brown cluster has been paused while engineers investigate the issue. We will provide an update by 5pm.
-
Data Depot partial outage (network drives)
The Data Depot began experiencing issues with its network drive mapping capability around 1:30pm EDT. The symptoms manifest as users being unable to map their Data Depot spaces as network drives on their Windows, Mac or Linux laptops and workstation...
-
The Geddes cluster began experiencing issues overnight. Engineers are currently diagnosing the issue and are working to identify a fix. Workloads will be unavailable while this issue is being addressed. We will provide an update by 12 PM.
-
Unscheduled Hammer Slurm outage
The Hammer cluster began experiencing issues with the Slurm scheduler around 5:00am, Thursday, July 6th. The Slurm scheduler is non-responsive, as a result, jobs will fail to schedule. Desktop and SSH access to Hammer login nodes is still available,...