RCAC Whole-Floor Downtime and Power Work
Link to update at August 2, 2021 12:24pm EDT UPDATE:
With the POD data center issue resolved, the Weber cluster has been returned back to normal service as of 12:24pm EDT. All queues have been enabled and jobs have resumed scheduling. Please report any issues to rcac-help@purdue.edu
This concludes the whole-floor downtime and maintenance. Thank you for your patience!
Link to update at August 2, 2021 11:59am EDT UPDATE:
Weber networking problem is resolved successfully and the cluster is ready to be returned to service. The RTS process is currently pending on the resolution of the sudden cooling issue in the POD data center.
We will provide another update by 6pm or sooner once the cooling problem is resolved.
Link to update at August 1, 2021 5:57pm EDT UPDATE:
Engineers continue troubleshooting Weber cluster networking issue that prevents it from returning to service. We will provide an update by noon tomorrow, August 2nd.
Link to update at August 1, 2021 2:35pm EDT UPDATE:
As of 2:35pm EDT, Geddes cluster has been returned back to normal service. Please report any issues to rcac-help@purdue.edu
Work continues on bringing Weber cluster back. We appreciate your patience and will provide an update by 6pm tonight.
Link to update at August 1, 2021 2:10pm EDT UPDATE:
As of 2:10pm EDT, Halstead cluster has been returned back to normal service. All queues have been enabled and jobs have resumed scheduling. Please report any issues to rcac-help@purdue.edu
Work continues on bringing Geddes and Weber clusters back. We appreciate your patience and will provide an update by 6pm tonight.
Link to update at August 1, 2021 11:55am EDT UPDATE:
As of 11:55am EDT, the required data center power work has been completed successfully.
Bell, Brown, CMS, Hammer, Gilbreth, Scholar and Workbench clusters have been returned back to normal service. All queues have been enabled and jobs have resumed scheduling. Please report any issues to rcac-help@purdue.edu
Work continues on bringing Halstead, Geddes and Weber clusters back. We appreciate your patience and will provide an update by 6pm tonight.
Link to update at July 28, 2021 4:09pm EDT UPDATE:
This is an update to remind you of the Maintenance downtime for most of the Research Computing resources starting this coming Friday, 30 July 2021. Please note in the attached schedule that some clusters (Brown, Hammer, and Weber) will be down on Friday while the others will not go down until Saturday.
The Data Depot will remain available for non-cluster access.
Link to original posting ORIGINAL:
The majority of the Research Computing computational resources will be unavailable July 30, 2021 7:00am - August 1, 2021 12:00pm EDT for a whole-floor downtime due to electrical power work in MATH and POD data centers. Along with a required preventative maintenance, the work will provide power equipment upgrades necessary to house the upcoming NSF-funded Anvil supercomputer.
Start time | End Time | Resources |
---|---|---|
Friday, July 30th, 2021 at 7:00am | Sunday, August 1st, 2021 at 12:00pm EDT | Brown, Hammer, Weber |
Saturday, July 31st, 2021 at 7:00am | Sunday, August 1st, 2021 at 12:00pm EDT | Bell, CMS, Halstead, Geddes, Gilbreth, Scholar, Workbench |
Not affected | Data Depot, WSC, WCERES, customer VMs and servers |
All systems will return to full production by Sunday, August 1st, 2021 at 12:00pm EDT.
Any SLURM jobs which request a walltime which would take them past the above times will not start and will remain in the queue until after the maintenance is completed.