Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Outages

  • Unscheduled Hammer OnDemand Outage

    Open OnDemand services for the Hammer cluster are currently offline. Engineers are investigating a boot disk failure on the server that hosts the gateway.hammer.rcac.purdue.edu virtual machine.

  • Data Depot degraded performance on RCAC clusters

    Users of Data Depot on RCAC clusters are currently experiencing significant performance degradation. The symptoms manifest as delays in listing or accessing files in /depot, significant lags in terminal sessions (especially if you have Data Depot in...

  • Unscheduled Cluster Outage

    Update: As of 3:45pm, the Bell cluster has returned to production status. Scheduling is still paused on the Negishi cluster, and we will have an update by 5:00pm EDT The Bell and Negishi clusters began experiencing issues with power around 1:00pm EDT...

  • Unschedlued Anvil Outage

    Anvil is experiencing more issues related to the power outage yesterday in the Purdue Data Center. Users are currently unable to login via any method, SSH, Open On Demand, etc. Engineers have been dispatched to resolve the issue. This post will be up...

  • Scheduling Paused on Negish Cluster

    At about noon today (Tuesday 12 September), we discovered an issue with the scheduler database related to the power outage last Sunday. Scheduling on Negishi has been paused to allow for work on correcting the database problem. We will have an upda...

  • Multiple clusters outage

    Multiple clusters have been powered off in MATH G109 datacenter due to a water issue in the building. Affected systems are Bell, Brown, Geddes, Gilbreth and Negishi. We will provide an update by 5:00 PM today.

  • Unscheduled Fortress outage

    Fortress began experiencing issues with its tape library around 5:00PM. Engineers are currently diagnosing the issue and are working to identify a fix. Vendor support has been contacted and diagnostics have been uploaded. During this time tape access...

  • Unscheduled Gilbreth outage

    Gilbreth is experiencing scheduling issues and jobs have been paused while RCAC works to resolve this issue. Running jobs have also been impacted, so you will need to resubmit your job if it was running when the job scheduling resumes. We will provid...

  • Unscheduled Geddes outage

    The Geddes cluster began experiencing networking issues around 9:00am EST. Engineers have identified the issue and are working to bring Geddes back to service. Deployments may be in a degraded state during this time. We will provide an update by 3:00...

  • Unscheduled Anvil outage - Partial

    The Anvil cluster began experiencing issues with Slurm Scheduling this past week. Engineers are currently diagnosing the root cause and are working to identify a fix. Scheduling is still enabled at this time. You may experience periodic SLURM outage...

  • Unscheduled Cluster Outage

    The Negishi cluster began experiencing issues around 2PM, and engineers isolated the problem with Negishi. In order to resolve the issue, the scheduler had to be restarted. If you had jobs running on Negishi, please check them to ensure they are stil...

  • Multiple Cluster Outage

    Clusters Bell, Geddes and Gilbreth experiencing outages as of 4PM Tuesday March 12th. Engineers are working to resolve the problem. Access to these clusters is impacted and running jobs may have stopped. We will provide an update Tuesday March 12th a...

  • Scholar Cluster Down

    The Scholar custer down from approximately 4:30PM onwards on Wednesday, March 27, 2024. Engineers have solved the issue and Scholar should return to service on Wednesday, March 27th at 8:30PM. If you submitted a job during this time frame, be sure to...

  • Unscheduled Negishi outage

    The Negishi cluster began experiencing issues with SLURM accounting around 8:00am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will provide...

  • Unscheduled Scholar outage

    The Scholar cluster began experiencing issues with adding new students from the courses onto the cluster. Engineers are currently diagnosing the issue and are working to identify a fix. The cluster is functioning normally so all the active Scholar us...

  • Unscheduled Bell outage

    The Bell cluster began experiencing issues with the /scratch filesystem at around 5pm today. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will p...

  • Unscheduled Globus service outage

    The Globus connection began experiencing issues on Data Depot endpoint around 11:00am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Please use other data transfer methods introduced in the user guide. We will pr...

  • Unscheduled Fortress outage

    Fortress began experiencing issues with communications around 3:00AM. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 12:00PM.

  • Unscheduled Bell Outage

    The Bell cluster began experiencing issues with the /scratch filesystem. Engineers are currently diagnosing the issue and are working to identify a fix for some files being unavailable. We will provide an update by 6pm.

  • Bell Scratch Interruptions

    Starting around 3:00PM eastern time, Bell's scratch file system began to experience service interruptions and periods of unresponsiveness to user requests. These events were caused by faulty hardware and engineers have temporarily mitigated the probl...