Article #796: Unscheduled Outage on Conte
Update - 9:20pm Conte has been returned to full production as of 9:15pm. During the failure earlier today, the internal tracking of jobs within the sc...
Update - 9:20pm Conte has been returned to full production as of 9:15pm. During the failure earlier today, the internal tracking of jobs within the sc...
The scratch filesystem has been restored to full service and all queues have been restarted. Original Message: The scratch filesystem serving Conte i...
As of 12:46, December 2, the home filesystem serving Conte, Hammer, Hansen, Hathi, Peregrine1, Radon, Rice, and Snyder was restored to normal operatio...
Most of the impact of this turned out to be to the Depot storage system, which has now been restored to normal operations. All the other affected sys...
The scheduler issue has been resolved, and Conte has been returned to normal operations as of Wednesday, February 10th, 2016 at 9:30pm EST. Update: Fe...
Engineering Computing Network (ECN) will be performing staged patching and reboots of all of ECN's RedHat Linux workstations and servers to protect ag...
Conte has been returned to normal operations as of Wednesday, May 18th, 2016 at 5:55am EDT. All upgrades were completed, though a small number of nod...
An issue was discovered shortly after Conte, Hansen, Hathi, and Radon were brought back online with the /group path on several front-ends and nodes. A...
The underlying storage has been fixed, and all these clusters have been returned to normal operations as of 10:00pm EDT. As of Tuesday, June 7th, 201...
Engineering Computing Network (ECN) will be performing scheduled maintenance this weekend on several ECN server resulting in their unavailability for...
We have seen a significant wave of these events this morning, September 21. For the most part, this wave seems to have been linked to a storage probl...
The Conte cluster will be unavailable beginning at September 27, 2016 7:00am - September 28, 2016 11:59pm EDT, for Home Filesystem Maintenance - All...
Conte has been returned to normal operations as well now. This concludes the home directory maintenance on all systems. Update: September 27, 2016 1...
Conte is back in production, and jobs have started running. Thank you for your patience. ===== Because of additional work required to fix a configura...
The scratch filesystem serving Conte is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until th...
The Conte and Hathi clusters have been updated and returned to full production. This is a gentle reminder that the Conte and Hathi clusters will be u...
As of 2:35 pm, Conte cluster is returned to service. Scheduling is resumed in all queues. Update The source of the problem has been identified and the...
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you...
Conte, Halstead, HalsteadGPU, and Hammer are back in full production. Job scheduling has been resumed on all clusters. Please let us know if you see a...
A failure has occurred in the systems which serve Data Depot to the various research clusters. Engineers are currently diagnosing the issue and are wo...