RCAC Whole-Floor Downtime and Power Work

July 30, 2021 7:00am - August 1, 2021 12:00pm EDT
Maintenance
Bell, Brown, CMS, Geddes, Gilbreth, Halstead, Hammer, Scholar, Weber, Workbench

Link to update at August 2, 2021 12:24pm EDT UPDATE: August 2, 2021 12:24pm EDT

With the POD data center issue resolved, the Weber cluster has been returned back to normal service as of 12:24pm EDT. All queues have been enabled and jobs have resumed scheduling. Please report any issues to rcac-help@purdue.edu

This concludes the whole-floor downtime and maintenance. Thank you for your patience!

Link to update at August 2, 2021 11:59am EDT UPDATE: August 2, 2021 11:59am EDT

Weber networking problem is resolved successfully and the cluster is ready to be returned to service. The RTS process is currently pending on the resolution of the sudden cooling issue in the POD data center.

We will provide another update by 6pm or sooner once the cooling problem is resolved.

Link to update at August 1, 2021 5:57pm EDT UPDATE: August 1, 2021 5:57pm EDT

Engineers continue troubleshooting Weber cluster networking issue that prevents it from returning to service. We will provide an update by noon tomorrow, August 2nd.

Link to update at August 1, 2021 2:35pm EDT UPDATE: August 1, 2021 2:35pm EDT

As of 2:35pm EDT, Geddes cluster has been returned back to normal service. Please report any issues to rcac-help@purdue.edu

Work continues on bringing Weber cluster back. We appreciate your patience and will provide an update by 6pm tonight.

Link to update at August 1, 2021 2:10pm EDT UPDATE: August 1, 2021 2:10pm EDT

As of 2:10pm EDT, Halstead cluster has been returned back to normal service. All queues have been enabled and jobs have resumed scheduling. Please report any issues to rcac-help@purdue.edu

Work continues on bringing Geddes and Weber clusters back. We appreciate your patience and will provide an update by 6pm tonight.

Link to update at August 1, 2021 11:55am EDT UPDATE: August 1, 2021 11:55am EDT

As of 11:55am EDT, the required data center power work has been completed successfully.

Bell, Brown, CMS, Hammer, Gilbreth, Scholar and Workbench clusters have been returned back to normal service. All queues have been enabled and jobs have resumed scheduling. Please report any issues to rcac-help@purdue.edu

Work continues on bringing Halstead, Geddes and Weber clusters back. We appreciate your patience and will provide an update by 6pm tonight.

Link to update at July 28, 2021 4:09pm EDT UPDATE: July 28, 2021 4:09pm EDT

This is an update to remind you of the Maintenance downtime for most of the Research Computing resources starting this coming Friday, 30 July 2021. Please note in the attached schedule that some clusters (Brown, Hammer, and Weber) will be down on Friday while the others will not go down until Saturday.

The Data Depot will remain available for non-cluster access.

Link to original posting ORIGINAL: July 30, 2021 7:00am - August 1, 2021 12:00pm EDT

The majority of the Research Computing computational resources will be unavailable July 30, 2021 7:00am - August 1, 2021 12:00pm EDT for a whole-floor downtime due to electrical power work in MATH and POD data centers. Along with a required preventative maintenance, the work will provide power equipment upgrades necessary to house the upcoming NSF-funded Anvil supercomputer.

Due to the nature and extent of the work, some resources will be affected longer than the others. The following table provides tentative maintenance start and end times for various Research Computing systems:
Start time	End Time	Resources
Friday, July 30th, 2021 at 7:00am	Sunday, August 1st, 2021 at 12:00pm EDT	Brown, Hammer, Weber
Saturday, July 31st, 2021 at 7:00am	Sunday, August 1st, 2021 at 12:00pm EDT	Bell, CMS, Halstead, Geddes, Gilbreth, Scholar, Workbench
Not affected		Data Depot, WSC, WCERES, customer VMs and servers

All systems will return to full production by Sunday, August 1st, 2021 at 12:00pm EDT.

Any SLURM jobs which request a walltime which would take them past the above times will not start and will remain in the queue until after the maintenance is completed.

Originally posted: July 7, 2021 11:58am EDT