Job Monitoring
Job Monitoring¶
Show jobs queue¶
To determine what jobs exist on the system use
:$ squeue --all
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
- JOBID: job id
- PARTITION: partition (use
sinfoto list all available partitions) - NAME: partition name
- USER: username
- ST: STate column,
- R: Running
- PD: PenDing
- TO: TimedOut
- S: Suspended
- CD: Completed
- CA: CAncelled
- F: Failed
- NF: Node Failure
To list jobs only for your user, use
squeue -u username
Check job scheduled time to start
squeue --start
squeue -o "%.8i %.9P %.10j %.10u %.8T %.5C %.4D %.6m %.10l %.10M %.10L %.16R"
Please check squeue man for more information.
man squeue
Job information¶
To view detailed job information use
:$ scontrol show job 689
JobId=689 JobName=test
UserId=user(1831) GroupId=user(1831) MCS_label=N/A
Priority=3348 Nice=0 Account=testproj QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=1-04:02:39 TimeLimit=2-00:00:00 TimeMin=N/A
SubmitTime=2025-06-03T09:54:41 EligibleTime=2025-06-03T09:54:41
AccrueTime=2025-06-03T09:54:41
StartTime=2025-06-03T09:54:41 EndTime=2025-06-05T09:54:41 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-06-03T09:54:41 Scheduler=Backfill
Partition=compute AllocNode:Sid=login05:3056374
ReqNodeList=m02 ExcNodeList=(null)
NodeList=m02
BatchHost=m02
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
ReqTRES=cpu=1,mem=8G,node=1,billing=1
AllocTRES=cpu=1,mem=8G,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=8G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/gpfs/users/staff/user/../test1.job
WorkDir=/gpfs/users/staff/user/benchmarks/lammps/spce
StdErr=/gpfs/users/staff/user/../test.err
StdIn=/dev/null
StdOut=/gpfs/users/staff/user/../test.out
TresPerTask=cpu=1
Pending Jobs¶
Common reasons for awaiting jobs.
| Dependency | This job is waiting for a dependent job to complete. |
| NodeDown | A node required by the job is down. |
| PartitionDown | The partition (queue) required by this job is in a DOWN state and temporarily accepting no jobs, for instance because of maintenance. Note that this message may be displayed for a time even after the system is back up. |
| Priority | One or more higher priority jobs exist for this partition or advanced reservation. Other jobs in the queue have higher priority than yours. |
| ReqNodeNotAvail | No nodes can be found satisfying your limits, for instance because maintenance is scheduled and the job can not finish before it |
| Reservation | The job is waiting for its advanced reservation to become available. |
| Resources | The job is waiting for resources (nodes) to become available and will run when Slurm finds enough free nodes. |
| SystemFailure | Failure of the SLURM system, a file system, the network, etc |