Unscheduled network outage on Brown, Rice, Snyder and Hammer

July 18, 2018  2:30am – 12:15pm
Brown, Hammer, Rice, Snyder

UPDATE: July 18, 2018  12:17pm

As of 12:15 pm Wednesday, 18 July 2018, Brown has returned to service and queued jobs are starting. As with the other clusters affected by this outage, jobs that were interrupted may need to be resubmitted.

This concludes the unscheduled outage of the Brown, Rice, Snyder, and Hammer clusters.


UPDATE: July 18, 2018  10:48am

As of 10:45 am, the Snyder and Hammer clusters are also back online and scheduling queued jobs. Jobs that were already running when the outage started may need to be resubmitted.


UPDATE: July 18, 2018  10:37am

As of 10:35 am, the Rice cluster has been returned to service and queued jobs are being started. Jobs that were running when the outage started may need to be resubmitted.


ORIGINAL: July 18, 2018  3:27am

The Brown, Hammer, Rice, and Snyder clusters began experiencing issues with scratch filesystems and network connectivity around 2:30am. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 10 am.

Originally posted: July 18, 2018  3:27am