Scheduled Gilbreth Upgrade
As of 7:44pm EST, the maintenance work is completed and the Gilbreth cluster is returned to normal service.
Gilbreth expansion resulted in significant changes in queue names and limits (see detailed description in the separate article. It is highly recommended that you ran an
slist command to re-acquaint yourself with new queues and limits that you have access to.
Additionally, please note that
slist behavior has changed. By default,
slist on Gilbreth now reports the count of GPUs in your queues (not the count of CPU cores as it did before and still continues on CPU-based clusters). The new mode aligns much better with Gilbreth scheduling policies, and will make monitoring your queues easier and more straightforward.
All queues have been enabled and jobs have resumed scheduling. Please report any issues to firstname.lastname@example.org
The bulk of maintenance work is complete and the effort continues on bringing Gilbreth back to normal operation. We will provide another update by 8pm tonight.
Gilbreth will be unavailable due to maintenance from November 30, 2022 8:00am until December 1, 2022 5:00pm EST to allow for an expansion to Gilbreth’s resources. This maintenance will complete the remaining work previously announced in the Gilbreth Upgrade which was postponed due to vendor hardware delays. This work will expand the capacity of Gilbreth through a significant increase in the number of GPUs on Gilbreth and will reduce wait times by restructuring queues. Both of these changes are outlined below:
New A10, A30, and A100 GPUs: A total of 48 new Nvidia A10 GPUs and 28 new Nvidia A100-80GB GPUs are being incorporated into Gilbreth to complement the 72 A30 GPUs that have already been added to Gilbreth to replace the legacy P100 GPUs. This hardware upgrade will increase the total number of GPUs on Gilbreth by 75%.
New queue limits and names: All users will be able to submit jobs to new lab/PI-specific queues as well as a new
longqueues will no longer be available. For more information, please see the detailed description of changes in this separate news article.
How does this affect you?
- You will be unable to login to Gilbreth or access data stored there during the maintenance.
- Jobs requesting a walltime which would cause the job to run beyond the start of maintenance will not start.
- All pending and running jobs will be deleted during the maintenance, as we must delete the old queues in which these jobs are running in order to create the new ones.
- After maintenance, you will need to ensure the queue--specified by the
-Aoption in your job scripts--is updated to submit to the correct queue as described in the above linked queue changes article.
As always, we appreciate your patience as we work to improve Gilbreth for Purdue researchers.
Please reach out to email@example.com if you have any questions about the upgrade or need support.