Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Outages

  • Unscheduled Brown and Hammer outage

    The Brown and Hammer clusters began experiencing issues with cooling due to problems at the Physical Facilities' chiller plant around 4:40pm EDT. To avoid overheating, job scheduling has been paused while this issue is being addressed. We will provid...

  • Unscheduled multiple clusters and Data Depot outage

    The Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, Workbench clusters and Data Depot began experiencing issues with intermittent high load on the Data Depot servers around 4:30pm EDT. Engineers are currently diagnosing the issue and are working to...

  • Unscheduled Bell outage

    The Bell cluster began experiencing issues with high load and sluggish performance on the scratch filesystem around 1:20pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this...

  • Unscheduled Weber outage

    The Weber cluster began experiencing issues with expired VPN certificate around 10:00am EST. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 5pm.

  • Unscheduled Bell outage

    The Bell cluster began experiencing issues with its scratch filesystem around 6:30pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will prov...

  • Unscheduled Weber outage

    The Weber cluster began experiencing issues with weber-sftp subsystem around 2:00pm EST. The problem affects ingress/egress path to the cluster. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an upda...

  • Unscheduled Bell outage

    The Bell cluster began experiencing issues with scheduler database around 11:35am EST. The problem manifests as freezing and/or "socket timed out" and "Unable to contact slurm controller" error messages upon the usual Slurm comman...

  • Unscheduled Gilbreth outage

    The Gilbreth cluster began experiencing issues with its Data Depot mounts around 9:00am EST. The /depot filesystem is not visible on some of the login and compute nodes. Engineers are currently diagnosing the issue and are working to identify a fix....

  • Unscheduled Data Depot outage

    As of 8:00pm EST on Friday, February 11th, 2022 the Data Depot filesystem outage has been resolved and scheduling has been resumed on all clusters. The Bell, Brown, Gilbreth, Halstead, Scholar, Workbench, and Data Depot cluster began experiencing i...

  • Unscheduled Math data center cooling outage

    The Math building data center began experience issues with its cooling system around 11:40am EST. As one of manifestations, users may experience issues while logging in to the Anvil, Bell, Gilbreth, Halstead, Workbench, and Data Depot clusters. To m...

  • Unscheduled Math Data Center Cooling Outage

    The Math building data center began experience issues with its cooling system around 11:40am EDT. As one of manifestations, users may experience issues while logging in to the Anvil, Bell, Gilbreth, and Halstead clusters. To minimize thermal load on...

  • Unscheduled Data Depot Slowdown on Community Clusters

    As of 9:00am EDT, users of community clusters may experience slowness while trying to access Data Depot (including loading modules, starting applications or reading data) . The symptoms appear on both login and compute nodes. System engineers are act...

  • Unscheduled Gilbreth cluster outage

    The Gilbreth cluster began experiencing issues with its scratch filesystem around 7:00pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will...

  • Gilbreth scratch degraded performance

    Following last night's scratch outage, the Gilbreth scratch filesystem is currently functional but operates with partially degraded performance. Engineers have opened a support ticket with the vendor and monitor the state of the filesystem continuou...

  • Unscheduled Weber ingress/egress outage

    The Weber cluster's data transfer server (weber-sftp.rcac.purdue.edu) suffered a cooling fan failure around 8:30pm EDT on Saturday, April 9th, 2022. The cluster remains operation with the exception of ingress/egress of files via the affected server....

  • Unscheduled Bell outage

    The Bell cluster began experiencing issues with its scratch filesystem around 9:00pm EDT on Saturday, April 9th, 2022. Access to files in scratch may appear severely delayed or frozen. Engineers are currently diagnosing the issue and are working to...

  • Unscheduled campus power outage

    Several Research Computing resources became affected by a campus power outage around 7:00pm EDT. Multiple login and compute nodes may have powered down, leading to jobs fail and/or requeue with a NODE_FAIL or similar status. Engineers are currently d...

  • Scheduling Paused on Brown and Hammer

    • widget.news::news.updated:

    Beginning around 2:00pm EDT, the ailing cooling systems for Brown and Hammer began experiencing issues. To reduce the thermal load on the systems, scheduling of new jobs has been paused on Brown and Hammer and will not be resumed until after tomorrow...

  • Unscheduled Halstead outage

    The Halstead cluster began experiencing issues with its scratch file system around 8:00am EDT. The problem manifests as various I/O errors or hangs when reading, writing or listing scratch directories. Engineers are currently diagnosing the issue and...

  • Bell Scratch Degraded Storage (Returned to Service)

    Bell Scratch is near capacity and performance is degraded. As of this morning, Bell Scratch was 94% full. This afternoon we paused scheduling as scratch was not responding consistently. We have more drives on order, but with global supply chain issue...