All RCAC systems unavailable some portion of Tue-Fri, 3/29-4/1

March 29, 2011  3:00am – April 1, 2011  6:00pm
Coates, Radon, Rossmann
Download Calander Event

All RCAC systems will be unavailable on Tuesday, March 29th from 3:00am – 6:00pm. The Rossmann, Coates, Radon, and Moffett clusters will remain down through 6:00pm Thursday, March 31st.

Update, 9:00am, March 29: Power has been restored to the Math building and storage upgrades are now proceeding as planned.

Update, 2:30pm, March 29: Storage upgrades continue as planned.

Update, 5:00pm, March 29: Miner has been returned to production service. Steele, Radon, and CMS will be returned to production service as soon as the vendor can correct a problem with their scratch filesystems. Rossmann, Coates, Radon, and Moffett will remain down through Thursday as planned to allow for upgrades to the chilled water service.

Update, 9:30pm, March 29: Steele, CMS, and Miner have been returned to production service. Rossmann, Coates, Radon, and Moffett will remain down through Thursday as planned to allow for upgrades to the chilled water service.

Update, 6:00pm, March 30: Work on the chilled water service to the Math building is proceeding as planned. Rossmann, Coates, Radon, and Moffett are expected remain down through 6:00pm Thursday.

Update, 12:00pm, March 31: Work on the chilled water service continues and has been extended for at least an extra hour, possibly more. Rossmann, Coates, Radon, and Moffett are expected remain down until at least 7:00pm Thursday.

Update, 3:00pm, March 31: Work on the chilled water service continues. Rossmann, Coates, Radon, and Moffett are expected remain down until at least 7:00pm Thursday.

Update, 5:00pm, March 31: Work on the chilled water service has been completed. Systems staff are in the process of bringing up Rossmann, Coates, Radon, and Moffett now. Current expectation for a return to production service is 8:00pm.

Update, 6:30pm, March 31: Radon and Moffett have been returned to production service. Systems staff are working on an issue with the scratch filesystem shared by Rossmann and Coates. Current expectation for a return to production service is 10:00pm.

Update, 8:30pm, March 31: Rossmann and Coates are going through final testing now. Current expectation for a return to production service is 9:30pm.

Update, 9:30pm, March 31: Rossmann and Coates have been returned to production service. However, approximately half of the Coates "A" nodes are still offline due to networking issues. The networking on these will be examined Friday. Because of this, queues on Coates-A may progress more slowly than usual until this is corrected. To try to compensate for this in the interim, all standby jobs have been temporarily held so owner queues can be assured maximum throughput.

Update, 11:30am, April 1: All Coates nodes have been returned to production service. The standby queue on Coates has been re-enabled. All systems are now operating normally.

No further updates are planned on this maintenance.

Originally posted: April 1, 2011