Unscheduled outage on Rice and Snyder
As of 9:15 PM, the Snyder and Rice clusters have been brought back into service after cooling was brought back online. Front-ends are operational and scheduling has been resumed.
At about 7:30 pm Wednesday, 17 February, 2016, the front-end login servers for the Snyder cluster went offline due to overheating. In an effort to reduce load on the cooling system while the problem is being addressed, we have temporarily paused the job schedulers for both Rice and Snyder.
While the temperature remains high, the Snyder cluster will not be available for logins, but currently running or queued jobs are not affected at this time.
The Rice cluster is still accepting logins, and jobs can be queued, but the scheduler will not attempt to start them until we have the temperature under control. As on Snyder, currently running jobs are not affected.
Please continue checking this news item; we will update it as more becomes known about the situation.