Aug. 5-17 research computing system outage FAQ

August 5 – 17, 2011
Coates, Radon, Rossmann
Download Calander Event

What’s happening?

ITaP’s research computing systems will be shut down beginning at 5 p.m. Friday, Aug 5, including the Rossmann, Coates, Moffett and Radon clusters. The supercomputers are scheduled to be off until Wednesday, Aug. 17.

Why?

An outage related to ongoing power and cooling upgrades at the Mathematical Sciences Building will cut electricity and cooling to the building at several points between Aug. 5 and 17. ITaP will bring systems back on line after the power upgrade is complete and the MATH data center has cooled sufficiently following the return to service of the data center’s cooling system.

How does this affect jobs I am running?

Only jobs short enough to finish before 5 p.m. Friday, August 5, will be scheduled for execution from now until the completion of the upgrade project. Longer jobs will be held in the queue until the systems return to production, which should be Wednesday, Aug. 17.

I am a Coates or Rossmann owner, what if I have urgent computing needs?

Please contact rcac-help@purdue.edu to discuss your needs and investigate the possibility of alternate research computing capability August 8-17.

Is the Steele cluster affected?

The Steele cluster, located outside MATH, will be down only from 8 a.m. Monday, Aug. 8, to 7 p.m. Tuesday, Aug. 9, for maintenance on its individual cooling and networking systems. That’s slightly longer than the originally planned 5 p.m. deadline Aug. 9 to allow adequate time to verify the work and make sure Steele is production ready again.

Are any other research systems affected?

Research computing systems serving the Compact Muon Solenoid (CMS) project, the Miner cluster at Purdue Calumet, Purdue’s Condor distributed computing system (called DiaGrid), which also is a resource made available to the national XSEDE research network, also will be down — but only from 8 a.m. to 5 p.m. Monday, Aug. 8. This is for maintenance of the home directory storage system on which they all rely.

What about access to my files?

The Fortress archival data storage system will be accessible the entire time, allowing researchers access to their data during the renovation project. The Lustre file system that serves as Coates and Rossmann scratch storage will be unavailable along with the clusters themselves. All home directories, BlueArc scratch (scratch95, scratch96, scratch98 and scratch99), and BlueArc project storage will remain available, except during the scheduled storage maintenance from 8 a.m. to 5 p.m., Monday, August 8.

What about non-research systems?

Data center space ITaP operates in MATH for campus units, including the Purdue libraries, will be affected by the power outage as well. The libraries’ catalog, database, electronic journal and other systems will be off line from 5 p.m. Friday, Aug. 5, to early Monday, Aug. 8, and from 5 p.m. Friday, Aug. 12, to sometime Sunday evening, Aug. 14. The ITaP student computer lab in Room B10 will be closed from 6 p.m. Friday, Aug. 5, to 6 a.m. Monday, Aug. 8, and 9 p.m. Friday, Aug. 12, to 6 a.m. Sunday, Aug. 14. Non-ITaP office and lab computers in Mathematical Sciences also will be affected Aug. 5-7 and 12-14, along with data center space ITaP operates in MATH for some other campus units. Those units have been notified.

What about email?

Only the research computing-related systems, as well as other computers in MATH, should be affected for the most part. Purdue email, OnePurdue and other ITaP-managed services are located in different buildings and they should function normally. The student computer lab in Stanley Coulter Hall also will be unaffected.

What’s the payoff?

The project, funded by a $2 million National Science Foundation grant to improve Purdue’s main research data center, should allow three new supercomputers to be installed at MATH. It also provides for redundant power and cooling to help avoid unscheduled outages, which can cause the loss of millions of hours of research computing time.

What does the project include?

The project will install two new redundant power transformers, a larger capacity chilled water feed and a secondary cooling loop designed to allow the data center to continue operation in the event of a disruption to the campus chilled water service.

What is required to get those benefits?

Two full-building power and cooling shutdowns will be required over the weekends of both August 6-7 and August 13-14 and there will be periodic interruptions between those weekends and until Aug. 17. The work involves removing and replacing old electrical gear to improve reliability of existing electrical feeds into MATH; disconnecting all air conditioners and cooling distribution units (CDUs) from old plumbing and reconnecting them to the new cooling loop; and reconnecting all data center electrical service to the new transformers.

Originally posted: July 5, 2011