Carter
-
The Isilon filesystem was restored to normal service and all affected clusters had it remounted as quickly as was sustainable by the filesystem. This process was completed by Wednesday, May 18th, 2016 at 12:15am EDT. All clusters other than Conte (...
-
Environment Modules System Upgrade
On Monday, May 9, 2016 the environment module system on Carter, Conte, Hansen, and Hathi will be upgraded to Lmod, bringing all compute clusters up to the same environment modules system. This new system has been in use on the Rice and Snyder cluster...
-
New web-based quota monitoring tool
A new web-based quota monitoring tool is available to all Research Cluster and Data Depot users. This tool is a web equivalent of the myquota tool on the clusters. The tool allows you to monitor your quota usage just like myquota, but it also allows...
-
Unscheduled Scratch Outage on Carter
UPDATE: The issue with Carter's scratch filesystem has been resolved. The filesystem is now available. Job scheduling on the cluster has been resumed. The scratch filesystem serving Carter is currently unavailable. Job scheduling on Carter has bee...
-
Unscheduled Scratch Outage on Carter
The scratch storage on Carter and Scholar has been returned to normal operations. The rebuild process will be continuing in the background, so we will be watching for any degradation in the storage performance. All queues have been re-activated. T...
-
ECN services outage - ITaP Research Computing systems impacted
Engineering Computing Network (ECN) will be performing staged patching and reboots of all of ECN's RedHat Linux workstations and servers to protect against a serious vulnerability in glibc system library. A significant number of ECN services will be...
-
Unscheduled scratch outage on Carter
There was an issue with the cluster's gateway switches, causing infiniband traffic to be incapable of IP over infiniband. This also caused an instability in the lustre scratch servers, which required that they be rebooted. Jobs that were using scratc...
-
The cause of this turned out to be a power loss to Carter's scratch filesystem and portions of the Data Depot, which has been restored now. Carter nodes are returning to normal operations now. Original Message: As of Thursday, February 4th, 2016 at...
-
Unscheduled Outage in Math Data Center
Most of the impact of this turned out to be to the Depot storage system, which has now been restored to normal operations. All the other affected systems are showing a return to normal operations now. Original Message: As of Thursday, February 4th,...
-
The underlying issues affecting Carter are resolved and job scheduling has been resumed. Many individual nodes remain offline for corrective action, and these will be returning to service gradually as engineers are able to fix them. In the interim,...
-
Carter has been returned to normal operation. Update: January 20, 2016 3:26pm: We are doing return to service testing now and expect Carter to return to production by 7:00pm. Update: January 20, 2016 12:00pm: Work is being wrapped up on Carter and...
-
Carter has been return to normal operations. All queues have been enabled. Update: December 2, 2015 12:15pm Carter is mostly ready to return to service, but the site-wide home filesystem has suffered a failure which is preventing this from being co...
-
October 30, 2015 11:00am ITaP Engineers have made additional timeout changes to the scratch filesystem which has increased stability. Additional work is being scheduled for Tuesday, December 1, 2015 from 7:00am to 7:00pm. October 8, 2015 5:00pm An e...
-
Emergency scratch maintenance on Carter and Scholar
The scratch filesystem serving Carter/Scholar underwent emergency maintenance through Friday night and well into Saturday. We expect this work to resolve the periodic hangs this filesystem has been experiencing for the last two days. Job scheduling...
-
Update: September 23, 2015 8am Shortly after 2am, Engineers were able to complete the file transfer and return Carter back to production. Update: September 22, 2015 11pm The file transfer continues and will last well into the night. The next update...
-
Storage and Network Upgrades for Carter Cluster
ITaP is pleased to announce several upgrades to the Carter cluster to better enable data-intensive science. Network To relieve potential bottlenecks on IP network traffic, Carter will receive an upgrade to its Infiniband-to-IP gateway. This gateway t...
-
Due to power work in the MSEE building, most ECN services will be unavailable between 6:30am – 9:00pm EDT on Saturday, August 15, 2015. For Research Computing users this means that software packages licensed through ECN servers will not be able to ch...
-
Data Depot connectivity issues
ITaP engineers have identified issues causing intermittent failures on Carter. Engineers are currently tuning parameters on Depot system that have been identified as potential fixes to the issues. Access to Depot on Carter has been stable since tunin...
-
Research Data Depot Security Updates
As of 3:15 pm the maintenance is complete and Research Data Depot is returned to full production. Original message: The storage servers powering the Research Data Depot will undergo maintenance on Thursday, February 26, 2015 from 10:00am - 4:00pm EST...
-
Important operating system updates - Community Clusters
On the morning of Thursday, February 5, 2015, Carter, Conte, Hansen, Peregrine1, Radon, and Rossmann login servers will be rebooted to apply an important Red Hat Linux operating system update. Additionally, during this time scratch storage servers w...