Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Carter

  • Emergency Carter Cluster Maintenance

    • widget.news::news.updated:

    Update: Owner queues on Carter have been restarted. While Carter is currently deemed stable, performance is still impacted. Engineers are closely monitoring the situation and will take corrective action if necessary. Update: At this time, only Carter...

  • Partial scratch outages on Rice, Snyder, Carter, Scholar and Hammer

    The scratch filesystems serving Carter, Hammer, Rice, Scholar, and Snyder started behaving abnormally this morning. This may have affected some jobs, and anyone using one of the login nodes for these clusters may have had sessions freeze or seen dela...

  • Emergency Security Patching of RCAC Clusters

    Due to a recent security vulnerability, the Carter, Halstead, Hammer, Radon, Rice, Scholar, and Snyder clusters will have their operating system upgraded to a newer version during February 2, 2017 5:00pm - March 2, 2017 5:00pm EST. Unlike other cl...

  • Carter Cluster Maintenance

    • widget.news::news.updated:

    The maintenance for Carter cluster was cancelled and will be rescheduled at a later date. The cluster has remained in service. Original Notice The Carter cluster will be unavailable beginning at Tuesday, January 10th, 2017 at 8:00am EST, for emergen...

  • Emergency Cluster Maintenance

    • widget.news::news.updated:

    The Carter Cluster was returned to production at 10:45pm on November 7. We apologize for this extended outage. Update: November 7, 2016 6:01pm Work on reinstalling the Carter nodes continues. All other systems have returned normal operations. We...

  • Unscheduled Depot Outage

    • widget.news::news.updated:

    Measures taken within the first two hours of this problem seem to have resolved the issue. Original Message: A portion of the systems serving the Research Data Depot have suffered a failure. Some systems using Depot have been affected, particularly...

  • Unscheduled Scratch Outage on Carter

    • widget.news::news.updated:

    UPDATE As of about 6:30 pm, the new scratch system was brought back online, and scheduling has been restarted on Carter. Original Message The new scratch filesystem serving Carter that was just activated on Tuesday night is currently unavailable. Bot...

  • Home Filesystem Maintenance - All Clusters

    • widget.news::news.updated:

    Conte has been returned to normal operations as well now. This concludes the home directory maintenance on all systems. Update: September 27, 2016 11:55pm All systems other than Conte have been successfully returned to normal operations with the ne...

  • Unscheduled scratch outage on Carter

    • widget.news::news.updated:

    UPDATE: ITaP engineers have implemented a temporary solution so that work may continue on Carter until the scheduled upcoming maintenance window on Tuesday. Any jobs running which were using the scratch space have been stopped in order to allow for t...

  • Degraded performance of several systems

    • widget.news::news.updated:

    We have seen a significant wave of these events this morning, September 21. For the most part, this wave seems to have been linked to a storage problem that has been resolved. However, we are implementing new monitoring and response procedures toda...

  • ECN Services Outage

    Engineering Computing Network (ECN) will be performing scheduled maintenance this weekend on several ECN server resulting in their unavailability for a short time. Some ECN services will be affected, including several software license servers for ITa...

  • POD Cluster Maintenance

    • widget.news::news.updated:

    Carter and Scholar are back online for use as of 6:25am, though they will be operating with many nodes still offline. Staff will be working through Wednesday to steadily increase the number of nodes available. This concludes the POD cluster mainten...

  • Unscheduled Storage Outage

    • widget.news::news.updated:

    The Isilon filesystem was restored to normal service and all affected clusters had it remounted as quickly as was sustainable by the filesystem. This process was completed by Wednesday, May 18th, 2016 at 12:15am EDT. All clusters other than Conte (...

  • Unscheduled Scratch Outage on Carter

    • widget.news::news.updated:

    UPDATE: The issue with Carter's scratch filesystem has been resolved. The filesystem is now available. Job scheduling on the cluster has been resumed. The scratch filesystem serving Carter is currently unavailable. Job scheduling on Carter has bee...

  • Unscheduled Scratch Outage on Carter

    • widget.news::news.updated:

    The scratch storage on Carter and Scholar has been returned to normal operations. The rebuild process will be continuing in the background, so we will be watching for any degradation in the storage performance. All queues have been re-activated. T...

  • ECN services outage - ITaP Research Computing systems impacted

    Engineering Computing Network (ECN) will be performing staged patching and reboots of all of ECN's RedHat Linux workstations and servers to protect against a serious vulnerability in glibc system library. A significant number of ECN services will be...

  • Unscheduled scratch outage on Carter

    • widget.news::news.updated:

    There was an issue with the cluster's gateway switches, causing infiniband traffic to be incapable of IP over infiniband. This also caused an instability in the lustre scratch servers, which required that they be rebooted. Jobs that were using scratc...

  • Unscheduled outage on Carter

    • widget.news::news.updated:

    The cause of this turned out to be a power loss to Carter's scratch filesystem and portions of the Data Depot, which has been restored now. Carter nodes are returning to normal operations now. Original Message: As of Thursday, February 4th, 2016 at...

  • Unscheduled Outage in Math Data Center

    • widget.news::news.updated:

    Most of the impact of this turned out to be to the Depot storage system, which has now been restored to normal operations. All the other affected systems are showing a return to normal operations now. Original Message: As of Thursday, February 4th,...

  • Unscheduled outage on Carter

    • widget.news::news.updated:

    The underlying issues affecting Carter are resolved and job scheduling has been resumed. Many individual nodes remain offline for corrective action, and these will be returning to service gradually as engineers are able to fix them. In the interim,...