Unscheduled Gautschi cluster outage
Link to update at March 1, 2025 11:20am EST UPDATE:
As of 10:30, Gautschi has been returned to service and jobs are running. We will be taking down half of the CPU nodes until we can further investigate the cause of the power outage on Monday. This will not affect user jobs under the expected load.
Additionally, due to the power outage on Gautschi, running jobs crashed and were automatically requeued. Since this outage happened after Gautschi's Early User Program ended, any jobs that had started prior to the end of the EUP were being held in the queue since those resources have expired. Because this differs from the behavior we told EUP participants to expect of their jobs, we have re-launched all batch jobs that were running prior to the outage. Some jobs that were pending at the time of of the outage and that belonged to EUP participants that no longer have resources on Gautschi have been cancelled.
If you have any questions, please reach out to us at rcac-help@purdue.edu
Link to original posting ORIGINAL:
The Gautschi cluster began experiencing issues with its power feed around 06:45am. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.
We will provide an update by 10:00am.