Partial outage affecting some Coates queues

April 10, 2012  5:45pm – 6:45pm
Coates

Update - 6:45 pm Tuesday, 10 April 2012

ITaP engineers have found and repaired the network issue that was affecting Coates nodes type B, C and E. Job scheduling has been resumed for all queues. If you encounter any problems, please report them to rcac-help@purdue.edu

Thank you for your patience.

5:45 pm Tuesday, 10 April 2012

Some Coates compute nodes have lost connection to the LustreA scratch file storage system. Affected nodes are in the type B, C, and E classes; type A and type D nodes are not affected.

System engineers are working on the problem. Until it is corrected, scheduling has been stopped on all queues on the affected nodes

New jobs on the affected queues will be accepted but held until scheduling is restarted. Currently running jobs will probably stall, and then continue once the problem has been solved.

The next update is scheduled for 9:00 pm tonight.

Originally posted: April 10, 2012