Resource Queues¶
In order to use all the nodes efficiently and in a fair share fashion, resources are distributed in queues (partitions). Queues group nodes into logical sets, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc.
To determine what partitions exist on the system, what nodes they include, and general system state.
use the sinfo command.
sinfo -s
ARIS queue (partition) overview:
Queue Table¶
| PARTITION | DESCRIPTION | AVAIL | TIMELIMIT | NODES | NODELIST |
|---|---|---|---|---|---|
| compute | Compute nodes w/o accelerator | up | 2-00:00:00 | 48 | m[01-48] |
| fat | Fat compute nodes | up | 2-00:00:00 | 16 | f[01-16] |
| gpu | GPU accelerated nodes | up | 2-00:00:00 | 2 | a[02-03] |
| mig | MIG GPU accelerated node | up | 2-00:00:00 | 1 | a01 |
- compute queue: Is intended to run parallel jobs on the THIN compute nodes.
- fat queue: Is dedicated to run parallel jobs on the FAT compute nodes.
- gpu queue: Provides access on the GPU accelerated nodes.
The scontrol command can be used to report more detailed information partitions and configuration
:$ scontrol show partition
PartitionName=compute
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO ExclusiveTopo=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=m[01-48]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=6144 TotalNodes=48 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=507904
TRES=cpu=6144,mem=23808G,node=48,billing=6144
PartitionName=fat
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO ExclusiveTopo=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=f[01-16]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=2048 TotalNodes=16 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=1015808
TRES=cpu=2048,mem=15.50T,node=16,billing=2048
PartitionName=gpu
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO ExclusiveTopo=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=a[02-03]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=256 TotalNodes=2 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=507904
TRES=cpu=256,mem=992G,node=2,billing=256,gres/gpu=8,gres/gpu:a100=8
PartitionName=mig
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO ExclusiveTopo=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=a01
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=128 TotalNodes=1 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=507904
TRES=cpu=128,mem=496G,node=1,billing=128,gres/gpu=16,gres/gpu:1g.10gb=8,gres/gpu:2g.20gb=4,gres/gpu:3g.40gb=4
:$ scontrol show node m01
NodeName=m01 Arch=x86_64 CoresPerSocket=64
CPUAlloc=0 CPUEfctv=128 CPUTot=128 CPULoad=0.00
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=m01 NodeHostName=m01 Version=24.11.5
OS=Linux 5.14.0-503.35.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 3 12:12:16 UTC 2025
RealMemory=507904 AllocMem=0 FreeMem=507277 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=compute
BootTime=2025-05-15T14:02:30 SlurmdStartTime=2025-06-04T02:19:10
LastBusyTime=2025-06-04T11:42:38 ResumeAfterTime=None
CfgTRES=cpu=128,mem=496G,billing=128
AllocTRES=
CurrentWatts=96 AveWatts=93