Article #685: Network Maintenance for Carter, Conte, and Hansen clusters
The Carter and Conte clusters will be briefly unavailable on Monday, October 13, 2014 for upgrades to the clusters' respective network routers. This u...
The Carter and Conte clusters will be briefly unavailable on Monday, October 13, 2014 for upgrades to the clusters' respective network routers. This u...
Due to a network issue at the Indiana GigaPOP, connectivity to RCAC resources from off campus is intermittent. Access to the research computing web si...
Engineering Computing Network (ECN) in coordination with Physical Facilities will be conducting a planned power outage in the MSEE building from 8am u...
On the morning of Thursday, February 5, 2015, Carter, Conte, Hansen, Peregrine1, Radon, and Rossmann login servers will be rebooted to apply an impor...
As of 3:15 pm the maintenance is complete and Research Data Depot is returned to full production. Original message: The storage servers powering the R...
ITaP engineers have identified issues causing intermittent failures on Carter. Engineers are currently tuning parameters on Depot system that have bee...
Due to power work in the MSEE building, most ECN services will be unavailable between 6:30am – 9:00pm EDT on Saturday, August 15, 2015. For Research C...
Update: September 23, 2015 8am Shortly after 2am, Engineers were able to complete the file transfer and return Carter back to production. Update: Sept...
The scratch filesystem serving Carter/Scholar underwent emergency maintenance through Friday night and well into Saturday. We expect this work to res...
October 30, 2015 11:00am ITaP Engineers have made additional timeout changes to the scratch filesystem which has increased stability. Additional work...
Carter has been return to normal operations. All queues have been enabled. Update: December 2, 2015 12:15pm Carter is mostly ready to return to serv...
Carter has been returned to normal operation. Update: January 20, 2016 3:26pm: We are doing return to service testing now and expect Carter to return...
The underlying issues affecting Carter are resolved and job scheduling has been resumed. Many individual nodes remain offline for corrective action,...
The cause of this turned out to be a power loss to Carter's scratch filesystem and portions of the Data Depot, which has been restored now. Carter no...
Most of the impact of this turned out to be to the Depot storage system, which has now been restored to normal operations. All the other affected sys...
There was an issue with the cluster's gateway switches, causing infiniband traffic to be incapable of IP over infiniband. This also caused an instabil...
Engineering Computing Network (ECN) will be performing staged patching and reboots of all of ECN's RedHat Linux workstations and servers to protect ag...
The scratch storage on Carter and Scholar has been returned to normal operations. The rebuild process will be continuing in the background, so we wil...
UPDATE: The issue with Carter's scratch filesystem has been resolved. The filesystem is now available. Job scheduling on the cluster has been resum...
The Isilon filesystem was restored to normal service and all affected clusters had it remounted as quickly as was sustainable by the filesystem. This...