Outages

Sort By:

Featured Newest to Oldest Oldest to Newest Recently Published

Campus network outage
- May 26, 2016 7:00am - 1:00pm EDT
Networking to and from campus, and around large parts of campus are down. Many services are unreachable at the moment. We will provide updates as they become available.
Network issues affecting Snyder cluster
- May 24, 2016 12:00pm - 4:00pm EDT Last updated: May 24, 2016 5:15pm EDT
The problem is now RESOLVED after the reboot of a router. ======= The network serving Snyder is currently experiencing issues. Attempts to log in to the cluster or access network filesystems will experience high latency and delays. Systems engineers...
Unscheduled Storage Outage
- May 17, 2016 5:30pm - May 18, 2016 12:15am EDT Last updated: May 18, 2016 12:34am EDT
The Isilon filesystem was restored to normal service and all affected clusters had it remounted as quickly as was sustainable by the filesystem. This process was completed by Wednesday, May 18th, 2016 at 12:15am EDT. All clusters other than Conte (...
Unscheduled Scratch Outage on Carter
- April 20, 2016 4:40pm - 8:20pm EDT Last updated: April 20, 2016 8:22pm EDT
UPDATE: The issue with Carter's scratch filesystem has been resolved. The filesystem is now available. Job scheduling on the cluster has been resumed. The scratch filesystem serving Carter is currently unavailable. Job scheduling on Carter has bee...
Unscheduled Scheduling Outage on Hansen
- April 8, 2016 5:00am - 10:45am EDT Last updated: April 8, 2016 10:56am EDT
Job scheduling on Hansen has returned to normal. This concludes the outage. Original Message: Hansen is not currently scheduling any new jobs. A filesystem used for a specific project serving Hansen became unavailable on Hansen around 5:00am. This...
Unscheduled Scratch Outage on Carter
- March 30, 2016 4:30pm - March 31, 2016 1:45pm EDT Last updated: March 31, 2016 1:54pm EDT
The scratch storage on Carter and Scholar has been returned to normal operations. The rebuild process will be continuing in the background, so we will be watching for any degradation in the storage performance. All queues have been re-activated. T...
Standby queues paused on Hansen Cluster
- March 28, 2016 12:15pm - 5:00pm EDT Last updated: March 28, 2016 5:29pm EDT
Update As of 5:20pm the standby and standby-c queues have been started and their jobs are being scheduled for execution. The standby queues on Hansen have been paused temporarily in order to address a slow scheduling issue. We expect to have them ba...
Unscheduled outage on Peregrine-1
- March 17, 2016 4:00pm - 6:40pm EDT Last updated: March 17, 2016 6:41pm EDT
Outage RESOLVED A misconfiguration that caused an unneeded IB driver to be loaded was fixed. Peregrine-1 is back online. Job scheduling is on. Original Message: The Peregrine-1 cluster is currently offline due to problems with the cluster nodes' op...
Unscheduled outage for Peregrine1
- March 7, 2016 12:30pm - 2:30pm EST Last updated: March 7, 2016 2:24pm EST
As of Monday, March 7th, 2016 at 12:30pm EST, the Peregrine1 cluster is unavailable due to a failed network switch in its datacenter. This switch is currently in the process of being replaced. Estimated time to complete this work and bring the clu...
ECN services outage - ITaP Research Computing systems impacted
- March 1, 2016 6:30am - 9:00am EST
Engineering Computing Network (ECN) will be performing staged patching and reboots of all of ECN's RedHat Linux workstations and servers to protect against a serious vulnerability in glibc system library. A significant number of ECN services will be...
Unscheduled Outage on Data Depot
- February 23, 2016 11:00am - February 24, 2016 6:00pm EST Last updated: February 24, 2016 6:13pm EST
The Depot filesystem checks have all completed cleanly and the Depot has been fully returned to normal operations. All queues on all clusters are scheduling new jobs again. Any existing jobs which had been waiting for Depot access may also resume....
Unscheduled outage on Rice and Snyder
- February 17, 2016 7:30pm - 9:15pm EST Last updated: February 17, 2016 9:30pm EST
As of 9:15 PM, the Snyder and Rice clusters have been brought back into service after cooling was brought back online. Front-ends are operational and scheduling has been resumed. Original Message: At about 7:30 pm Wednesday, 17 February, 2016, the fr...
Unscheduled scratch outage on Carter
- February 12, 2016 10:20am - 12:00pm EST Last updated: February 12, 2016 12:04pm EST
There was an issue with the cluster's gateway switches, causing infiniband traffic to be incapable of IP over infiniband. This also caused an instability in the lustre scratch servers, which required that they be rebooted. Jobs that were using scratc...
Unscheduled Outage in Math Data Center
- February 4, 2016 8:00am - 10:30am EST Last updated: February 4, 2016 10:40am EST
Most of the impact of this turned out to be to the Depot storage system, which has now been restored to normal operations. All the other affected systems are showing a return to normal operations now. Original Message: As of Thursday, February 4th,...
Unscheduled outage on Carter
- February 4, 2016 8:00am - 10:30am EST Last updated: February 4, 2016 10:42am EST
The cause of this turned out to be a power loss to Carter's scratch filesystem and portions of the Data Depot, which has been restored now. Carter nodes are returning to normal operations now. Original Message: As of Thursday, February 4th, 2016 at...
Unscheduled outage on Carter
- February 2, 2016 6:00pm - February 3, 2016 10:50pm EST Last updated: February 3, 2016 10:51pm EST
The underlying issues affecting Carter are resolved and job scheduling has been resumed. Many individual nodes remain offline for corrective action, and these will be returning to service gradually as engineers are able to fix them. In the interim,...
Unscheduled scratch outage on Hammer
- February 1, 2016 8:00am - February 5, 2016 7:00pm EST Last updated: February 5, 2016 7:03pm EST
The Hammer scratch filesystem has now returned to normal operations. Original Message: During the maintenance of the Rice and Snyder clusters this week, it became necessary to shut down the scratch filesystem which these clusters currently share with...
Unscheduled Home Filesystem Outage
- December 2, 2015 12:00pm - 12:45pm EST Last updated: December 2, 2015 1:52pm EST
As of 12:46, December 2, the home filesystem serving Conte, Hammer, Hansen, Hathi, Peregrine1, Radon, Rice, and Snyder was restored to normal operation. All queues have been re-enabled. As of Wednesday, December 2nd, 2015 at 12:00pm EST, Conte, Hamm...
Unscheduled scratch outage on Rice, Hammer, and Snyder
- December 1, 2015 11:15am - 4:15pm EST Last updated: December 1, 2015 4:50pm EST
The scratch filesystem serving Hammer, Rice, and Snyder has been restored to normal operations, and all queues have been re-enabled. Original Message: The scratch filesystem serving Hammer, Rice, and Snyder is partially unavailable. Both currently ru...
Unscheduled scratch outage on Conte
- November 28, 2015 12:00pm - 5:30pm EST Last updated: November 28, 2015 6:06pm EST
The scratch filesystem has been restored to full service and all queues have been restarted. Original Message: The scratch filesystem serving Conte is currently unavailable. Both currently running jobs and attempts to access files in scratch will bl...

Results 241-260 of 317

Outages

Follow Us