Emergency Cluster Maintenance
The Carter Cluster was returned to production at 10:45pm on November 7. We apologize for this extended outage.
Update: November 7, 2016 6:01pm
Work on reinstalling the Carter nodes continues. All other systems have returned normal operations. We will issue an update on Carter status by midnight tonight.
Update: November 7, 2016 6:00pm
The Carter cluster updates did not deploy correctly to the Carter nodes, and these nodes are all now being prepared for reinstallation to get them back where they need to be. We will issue an update on Carter's status by 6:00pm.
Additionally, problems logging in to the Scholar and Hathi clusters have been addressed this morning. A problem with multi-node jobs and qpeek on Rice has also been identified, and a fix is being deployed across all Rice nodes at this moment.
Update: November 7, 2016 12:35am
All clusters other than Carter have been successfully updated and brought back online over the course of the weekend. Carter is posing some extra challenges. Systems administrators will continue to work on it and we will post an update on Carter's status no later than Monday, November 7th, 2016 at 10:45pm EST.
The Carter, Hathi, Rice, and Scholar clusters will be taken down for emergency cluster maintenance beginning at Saturday, November 5th, 2016 at 8:00am EDT. The clusters will return to normal operations by Sunday, November 6th, 2016 at 11:59pm.
During this time, Carter, Hathi, Rice, and Scholar will have critical kernel security patches applied.
Any PBS jobs already in progress which do not complete by Saturday, November 5th, 2016 at 8:00am EDT will unfortunately have to be terminated. Any new or queued PBS jobs which request a walltime which would take them past Saturday, November 5th, 2016 at 8:00am EDT will not start and will remain in the queue until after the maintenance is completed.