Emergency Maintenance on Rice, Snyder, Hammer

April 26, 2017  2:40pm – April 27, 2017  12:00pm
Hammer, Rice, Snyder

As of 7:15pm, all queues on these clusters have resumed scheduling. Nodes will continue to be upgraded as they finish current jobs and become available. In the interim, the clusters will run in a degraded state, but will continue to start new jobs and allow existing jobs to complete.

We will update this outage by noon tomorrow as to the overall progress.

Original Message:

As of Wednesday, April 26th, 2017 at 2:40pm, Hammer, Rice, and Snyder are having all new job scheduling halted in order to roll out an emergency fix to network firmware.

Jobs currently running will remain unaffected, but all new jobs will be paused while nodes are patched and gradually returned to service. We expect most nodes to be patched tonight, though some subsections of nodes, notably in Snyder and Hammer will take longer to properly upgrade. Scheduling will be resumed as soon as nodes start returning to normal service.

All front-ends will be patched and rebooted at 6:00pm today.

Originally posted: April 26, 2017  3:56pm