Skip to main content

Queues

Link to section '"mylab" Queues' of 'Queues' "mylab" Queues

Gilbreth, as a community cluster, has one or more queues dedicated to and named after each partner who has purchased access to the cluster. These queues provide partners and their researchers with priority access to their portion of the cluster. Jobs in these queues are typically limited to 336 hours. The expectation is that any jobs submitted to your research lab queues will start within 4 hours, assuming the queue currently has enough capacity for the job (that is, your lab mates aren't using all of the cores currently).

Link to section 'Training Queue' of 'Queues' Training Queue

If your job can scale well to multiple GPUs and it requires longer than 24 hours, then use the training queue. Since the training nodes have specialty hardware and are few in number, these are restricted to users whose workloads can scale well with the number of GPUs. Please note that staff may ask you to provide evidence that your jobs can fully utilize the GPUs, before granting access to this queue. The Max wall time is 3 days, the number of jobs a user could concurrently run is 2, and the total number of consumed GPUs is 8. There are only 5 nodes in this queue, so you may have to wait a considerable amount of time before your job is scheduled.

Link to section 'Standby Queue' of 'Queues' Standby Queue

Additionally, community clusters provide a "standby" queue which is available to all cluster users. This "standby" queue allows users to utilize portions of the cluster that would otherwise be idle, but at a lower priority than partner-queue jobs, and with a relatively short time limit, to ensure "standby" jobs will not be able to tie up resources and prevent partner-queue jobs from running quickly. Jobs in standby are limited to 4 hours. There is no expectation of job start time. If the cluster is very busy with partner queue jobs, or you are requesting a very large job, jobs in standby may take hours or days to start.

Link to section 'Debug Queue' of 'Queues' Debug Queue

The debug queue allows you to quickly start small, short, interactive jobs in order to debug code, test programs, or test configurations. You are limited to one running job at a time in the queue, and you may run up to two GPUs for 30 minutes. The expectation is that debug jobs should start within a couple of minutes, assuming all of its dedicated nodes are not taken by others.

Link to section 'List of Queues' of 'Queues' List of Queues

To see a list of all queues on Gilbreth that you may submit to, use the slist command

This lists each queue you can submit to, the number of nodes allocated to the queue, how many are available to run jobs, and the maximum walltime you may request. Options to the command will give more detailed information. This command can be used to get a general idea of how busy an individual queue is and how long you may have to wait for your job to start.

The default output mode of slist command shows the available GPU counts in queues:
$ slist

                      Current Number of GPUs                        Node
Account           Total    Queue     Run    Free    Max Walltime    Type
==============  =================================  ==============  ======
debug               183        0       0     183      00:30:00     B,D,E,F,G,H,I
standby             183       77      55      98      04:00:00     B,D,E,F,G,H,I
training             20        0       8      12     3-00:00:00    C,J
mylab                80        0       0      80    14-00:00:00    F

To check the number of CPUs mounted on each queue, please use slist -c command.

Link to section 'Summary of Queues' of 'Queues' Summary of Queues

Gilbreth contains several queues and heterogeneous hardware consisting of different number of cores and different GPU models. Some queues are backed by only one node type, but some queues may land on multiple node types. On queues that land on multiple node types, you will need to be mindful of your resource request. Below are the current combinations of queues, GPU types, and resources you may request.

Gilbreth queues
Queue GPU Type Number of GPUs per node Intended use-case Max walltime Max GPUs pre user concurrently Max Jobs running per user
Standby V100 (16 GB), V100 (32 GB), A100 (40 GB), A100 (80 GB), A10 (24 GB), A30 (24 GB) 16 (2), 40 (2), 128 (2), 128 (2), 32 (3), 24/16 (3) Short to moderately long jobs 4 hours 16 16
training V100 (32 GB, NVLink), A100 (80GB, NVLink) 20 (4), 128 (4) Long jobs that can scale well to multiple GPUs, such as Deep Learning model training 3 days 8 2
debug V100 (16 GB), V100 (32 GB), A100 (40 GB), A100 (80 GB), A10 (24 GB), A30 (24 GB) 16 (2), 40 (2), 128 (2), 128 (2), 32 (3), 24/16 (3) Quick testing 30 mins 2 1
"mylab" Based on Purchase Based on Purchase There will be a separate queue for each type of GPU the partners have purchased. 2 Weeks Amount Purchased Based on Purchase
Helpful?

Thanks for letting us know.

Please don't include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.