Conte Cluster Maintenance

April 14, 2015  8:00am – 10:00pm
Conte

UPDATE

As of 9:00 pm Tuesday, 14 April, 2015, the Conte cluster is back in full production mode. During the maintenance, all nodes were checked for reliability and system software installations were checked for consistency between nodes, and issues discovered during testing were corrected. The planned Torque and Moab upgrade has been deferred to a later date.

I apologize again for the delay in returning the cluster to operation, but we believe all users will benefit from the work done by the Systems staff.

Thank you for your patience.

UPDATE

As of 8:00 pm Tuesday, 14 April, 2015, the Conte cluster is still unavailable, as systems staff are continuing work to correct a problem discovered during testing.

We will have more information no later than 9:00 pm tonight, Tuesday, 14 April, 2015.

UPDATE

The maintenance has been completed on Conte, but systems staff are investigating some issues that came up during testing.

We will have an update to this notice no later than 8:00 pm.

I apologize for this delay, and thank you for your patience.

superseded

The Conte cluster will be unavailable on Tuesday, 14 April, while it is being updated. The maintenance is tentatively scheduled to begin at 8:00 am and be completed by the end of the work day.

During that time the Torque and Moab batch scheduling software will be upgraded to newer versions, and the InfiniBand network connecting the cluster's nodes will be tested for performance and reliability, and any problems corrected.

Any jobs in the queue that would take them past the beginning of the outage period will be held until the cluster is restored to production.

As always, please let us know at rcac-help@purdue.edu if you have any questions or concerns about this outage.

Originally posted: March 31, 2015  5:40pm