Article #943: Unscheduled scratch outage on Rice, Snyder, and Hammer
The scratch filesystem serving Hammer, Rice, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch...
The scratch filesystem serving Hammer, Rice, and Snyder is currently unavailable. Both currently running jobs and attempts to access files in scratch...
Following the security updates on Halstead, an issue was discovered that prevented multi-node MPI jobs from running properly. Scheduling on Halstead h...
The scratch filesystem serving Conte is currently unavailable. Both currently running jobs and attempts to access files in scratch will block until th...
System monitoring has revealed intermittent issues connecting to the Research Data Depot on Thursday January 19. When this issue occurs, users will ex...
Following the restoration of power to the affected building, the EXRC cluster has been returned to service on Thursday, December 22nd, 2016 at 2:45pm...
UPDATE As of 7:50 pm, Wednesday, 14 December 2016, this issue is completely resolved. UPDATE As of about 6:00 pm another problem has been found in the...
Update: Engineers were able to isolate the problem and restart the necessary systems. The Data Depot should be available again. Halstead users should...
Job scheduling was paused on Radon between 6 pm and 7 pm this evening. Node monitoring processes marked most nodes offline around 6 pm, preventing new...
This issue has been resolved. Original Message: A portion of the systems serving the Research Data Depot have suffered a failure. Some systems using D...
The issue with the GitHub web interface was resolved late yesterday evening. The website is now reflecting changes made to git repositories as normal....
Measures taken within the first two hours of this problem seem to have resolved the issue. Original Message: A portion of the systems serving the Rese...
UPDATE As of about 6:30 pm, the new scratch system was brought back online, and scheduling has been restarted on Carter. Original Message The new scra...
UPDATE: ITaP engineers have implemented a temporary solution so that work may continue on Carter until the scheduled upcoming maintenance window on Tu...
We have seen a significant wave of these events this morning, September 21. For the most part, this wave seems to have been linked to a storage probl...
UPDATE As of 5:30 pm. Friday, 5 August, 2016, we believe the problem affecting access to the Data Depot has been corrected. Thank you for your patienc...
As of 3:20 pm, the self-service tool is back in action. An issue with the database backing authentication was discovered and repaired. Original messag...
As of 7:30 pm, all methods for connecting to Data Depot have been restored to working order. All connections with Samba (Network Drive mappings: datad...
Engineering Computing Network (ECN) will be performing scheduled maintenance this weekend on several ECN server resulting in their unavailability for...
The underlying storage has been fixed, and all these clusters have been returned to normal operations as of 10:00pm EDT. As of Tuesday, June 7th, 201...
Networking to and from campus, and around large parts of campus are down. Many services are unreachable at the moment. We will provide updates as they...