Unscheduled outage on Peregrine-1
March 17, 2016 4:00pm – 6:40pm
Outage RESOLVED A misconfiguration that caused an unneeded IB driver to be loaded was fixed. Peregrine-1 is back online. Job scheduling is on.
The Peregrine-1 cluster is currently offline due to problems with the cluster nodes' operating system software. This failure occurred gradually as nodes completed jobs, so there was no loss of jobs due to the outage, although no new jobs are able to run at the moment.
Engineers are investigating the issue and hope to return the nodes to normal function. However, there is currently no estimate for return to service.