Article #819: Unscheduled outage on Carter
The cause of this turned out to be a power loss to Carter's scratch filesystem and portions of the Data Depot, which has been restored now. Carter no...
The cause of this turned out to be a power loss to Carter's scratch filesystem and portions of the Data Depot, which has been restored now. Carter no...
Most of the impact of this turned out to be to the Depot storage system, which has now been restored to normal operations. All the other affected sys...
The Hathi and WinHPC clusters will be unavailable beginning at Thursday, February 4th, 2016 at 6:00am EST, for scheduled maintenance to the power feed...
Fortress will be unavailable from 8:00am to 9:00am Wednesday, 3 February, 2016 for routine maintenance.
The underlying issues affecting Carter are resolved and job scheduling has been resumed. Many individual nodes remain offline for corrective action,...
As of 10:40pm, the Snyder cluster was returned to normal service in the POD. This concludes this maintenance. Update: February 5, 2016 8:54pm As of...
The Hammer scratch filesystem has now returned to normal operations. Original Message: During the maintenance of the Rice and Snyder clusters this wee...
Carter has been returned to normal operation. Update: January 20, 2016 3:26pm: We are doing return to service testing now and expect Carter to return...
January 7, 2016, 6pm The Fortress move has completed and has been returned to production. Original Due to a failure in the notice system, the earlier...
As of 12:46, December 2, the home filesystem serving Conte, Hammer, Hansen, Hathi, Peregrine1, Radon, Rice, and Snyder was restored to normal operatio...
The scratch filesystem serving Hammer, Rice, and Snyder has been restored to normal operations, and all queues have been re-enabled. Original Message:...
Carter has been return to normal operations. All queues have been enabled. Update: December 2, 2015 12:15pm Carter is mostly ready to return to serv...
The scratch filesystem has been restored to full service and all queues have been restarted. Original Message: The scratch filesystem serving Conte i...
Update - 9:20pm Conte has been returned to full production as of 9:15pm. During the failure earlier today, the internal tracking of jobs within the sc...
The Fortress Archive service, Fortress, will be unavailable starting Wednesday, November 4th, 2015 at 6:00am EST for regular maintenance and will retu...
November 3, 2015 6:15pm The maintenance for Radon is completed and the cluster has been returned to production. Original The Radon cluster will be una...
Service was restored around 7:30pm today. Engineers changed the way Samba authenticates users to avoid this problem going forward. -- Service was rest...
October 22, 2015 9:15pm All services have been restored and Hammer is now in production. October 22, 2015 7:00pm Engineers continue to work through is...
October 30, 2015 11:00am ITaP Engineers have made additional timeout changes to the scratch filesystem which has increased stability. Additional work...
The scratch filesystem serving Carter/Scholar underwent emergency maintenance through Friday night and well into Saturday. We expect this work to res...