Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Outages and Maintenance

  • Math Cluster Maintenance

    • widget.news::news.updated:

    An issue was discovered shortly after Conte, Hansen, Hathi, and Radon were brought back online with the /group path on several front-ends and nodes. Any scripts or jobs that rely on the /group path may have had issues immediately following return to...

  • Campus network outage

    Networking to and from campus, and around large parts of campus are down. Many services are unreachable at the moment. We will provide updates as they become available.

  • Campus networking outage

    Networking to and from campus, and around large parts of campus are down. Many services are unreachable at the moment. We will provide updates as they become available.

  • Network issues affecting Snyder cluster

    • widget.news::news.updated:

    The problem is now RESOLVED after the reboot of a router. ======= The network serving Snyder is currently experiencing issues. Attempts to log in to the cluster or access network filesystems will experience high latency and delays. Systems engineers...

  • Unscheduled Storage Outage

    • widget.news::news.updated:

    The Isilon filesystem was restored to normal service and all affected clusters had it remounted as quickly as was sustainable by the filesystem. This process was completed by Wednesday, May 18th, 2016 at 12:15am EDT. All clusters other than Conte (...

  • Conte Cluster Maintenance

    • widget.news::news.updated:

    Conte has been returned to normal operations as of Wednesday, May 18th, 2016 at 5:55am EDT. All upgrades were completed, though a small number of nodes which require more attention to be fully ready for jobs remain offline for now and will be return...

  • Unscheduled Scratch Outage on Carter

    • widget.news::news.updated:

    UPDATE: The issue with Carter's scratch filesystem has been resolved. The filesystem is now available. Job scheduling on the cluster has been resumed. The scratch filesystem serving Carter is currently unavailable. Job scheduling on Carter has bee...

  • Unscheduled Scheduling Outage on Hansen

    • widget.news::news.updated:

    Job scheduling on Hansen has returned to normal. This concludes the outage. Original Message: Hansen is not currently scheduling any new jobs. A filesystem used for a specific project serving Hansen became unavailable on Hansen around 5:00am. This...

  • Unscheduled Scratch Outage on Carter

    • widget.news::news.updated:

    The scratch storage on Carter and Scholar has been returned to normal operations. The rebuild process will be continuing in the background, so we will be watching for any degradation in the storage performance. All queues have been re-activated. T...

  • Standby queues paused on Hansen Cluster

    • widget.news::news.updated:

    Update As of 5:20pm the standby and standby-c queues have been started and their jobs are being scheduled for execution. The standby queues on Hansen have been paused temporarily in order to address a slow scheduling issue. We expect to have them ba...

  • Unscheduled outage on Peregrine-1

    • widget.news::news.updated:

    Outage RESOLVED A misconfiguration that caused an unneeded IB driver to be loaded was fixed. Peregrine-1 is back online. Job scheduling is on. Original Message: The Peregrine-1 cluster is currently offline due to problems with the cluster nodes' op...

  • Unscheduled outage for Peregrine1

    • widget.news::news.updated:

    As of Monday, March 7th, 2016 at 12:30pm EST, the Peregrine1 cluster is unavailable due to a failed network switch in its datacenter. This switch is currently in the process of being replaced. Estimated time to complete this work and bring the clu...

  • ECN services outage - ITaP Research Computing systems impacted

    Engineering Computing Network (ECN) will be performing staged patching and reboots of all of ECN's RedHat Linux workstations and servers to protect against a serious vulnerability in glibc system library. A significant number of ECN services will be...

  • Unscheduled Outage on Data Depot

    • widget.news::news.updated:

    The Depot filesystem checks have all completed cleanly and the Depot has been fully returned to normal operations. All queues on all clusters are scheduling new jobs again. Any existing jobs which had been waiting for Depot access may also resume....

  • Unscheduled outage on Rice and Snyder

    • widget.news::news.updated:

    As of 9:15 PM, the Snyder and Rice clusters have been brought back into service after cooling was brought back online. Front-ends are operational and scheduling has been resumed. Original Message: At about 7:30 pm Wednesday, 17 February, 2016, the fr...

  • Unscheduled scratch outage on Carter

    • widget.news::news.updated:

    There was an issue with the cluster's gateway switches, causing infiniband traffic to be incapable of IP over infiniband. This also caused an instability in the lustre scratch servers, which required that they be rebooted. Jobs that were using scratc...

  • Cluster Maintenance - Conte

    • widget.news::news.updated:

    The scheduler issue has been resolved, and Conte has been returned to normal operations as of Wednesday, February 10th, 2016 at 9:30pm EST. Update: February 10, 2016 7:04pm There was a minor issue discovered with the newly upgraded scheduler which i...

  • Unscheduled outage on Carter

    • widget.news::news.updated:

    The cause of this turned out to be a power loss to Carter's scratch filesystem and portions of the Data Depot, which has been restored now. Carter nodes are returning to normal operations now. Original Message: As of Thursday, February 4th, 2016 at...

  • Unscheduled Outage in Math Data Center

    • widget.news::news.updated:

    Most of the impact of this turned out to be to the Depot storage system, which has now been restored to normal operations. All the other affected systems are showing a return to normal operations now. Original Message: As of Thursday, February 4th,...

  • Hathi & WinHPC Power Maintenance

    The Hathi and WinHPC clusters will be unavailable beginning at Thursday, February 4th, 2016 at 6:00am EST, for scheduled maintenance to the power feed. Both clusters will return to full production by Thursday, February 4th, 2016 at 5:00pm EST. During...