Job Submission
Job Submission¶
In order to create a resource allocation and launch tasks you can submit a batch script.
A batch script, submitted to the scheduling system must specify the job specifications:
- resource queue , default is
compute - number of nodes required
- number of cores per node required
- maximum wall time for the job , (please notice the jobs exceeding wall time will be killed).
To submit a job, user can use the sbatch command.
sbatch my_script
Please check sbatch man for more information.
man sbatch
Define batch script¶
Batch scripts contain
- scheduler directives : lines begin with #SBATCH
- shell commands: UNIX shell (bash) commands
- job steps: created with the
sruncommand
#!/bin/bash -l
#SBATCH --job-name=my_script # Job name
#SBATCH --ntasks=2 # Number of tasks
#SBATCH --time=01:30:00 # Run time (hh:mm:ss) - 1.5 hours
module load gnu #load any needed modules
echo "Start at `date`"
cd $HOME/workdir
./a.out
echo "End at `date`"
To submit this batch script
sbatch my_script
Job Specifications¶
| Option | Argument | Specification |
|---|---|---|
| --job-name, -J | job_name | Job name is job_name |
| --partition, -p | queue_name | Submits to queue queue_name |
| --account, -A | project_name | Project to charge compute hours |
| --ntasks, -n | number_of_tasks | Total number of tasks |
| --nodes, -N | number_of_nodes | Number of nodes |
| --ntasks-per-node | ntasks_per_node | Tasks per node |
| --cpus-per-task, -c | ntasks_per_node | Threads per task |
| --time, -t | HH:MM:SS | Time limit (hh:mm:ss) |
| --mem | memory_mb | Total memory requirements (MB) |
| --mem-per-cpu | memory_mb | Memory per task (MB) |
| --output, -o | stdout_filename | Direct job satndard output to stdout_filename, (%j expands to jobID) |
| --error, -e | stderr_filename | Direct job error to error_file, (%j expands to jobID) |
| --depend, -d | afterok:jobid | Job dependency |
SLURM Environment Variables¶
SLURM provides environment variables for most of the values used in the #SBATCH directives.
| Evironment Variable | Description |
|---|---|
| $SLURM_JOBID | Job id |
| $SLURM_JOB_NAME | Job name |
| $SLURM_SUBMIT_DIR | Submit directory |
| $SLURM_SUBMIT_HOST | Submit host |
| $SLURM_JOB_NODELIST | Node list |
| $SLURM_JOB_NUM_NODES | Number of nodes |
| $SLURM_CPUS_ON_NODE | Number of cores/node |
| $SLURM_CPUS_PER_TASK | Threads per task |
| $SLURM_NTASKS_PER_NODE | Number of tasks per node |
#!/bin/bash -l
#SBATCH --job-name=slurm_env
#SBATCH --nodes=2 # 2 nodes
#SBATCH --ntasks-per-node=12 # Number of tasks to be invoked on each node
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
#SBATCH --time=00:01:00 # Run time in hh:mm:ss
#SBATCH --error=job.%J.out
#SBATCH --output=job.%J.out
echo "Start at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS_PER_NODE tasks per node"
echo "Job id is $SLURM_JOBID"
echo "End at `date`"
Job Scripts¶
Here are some sample job submission scripts for different runtime models.
- MPI job: Run multi-process programs with MPI.
- Hybrid job: Parallel programs with MPI and OpenMP threads.
- GPU job: Utilize GPU accelerators.
Pure MPI batch script¶
Launch MPI jobs with srun command
DON’T USE mpirun AND mpiexec
#!/bin/bash -l
#-----------------------------------------------------------------
# Pure MPI job , using 256 procs on 2 nodes ,
# with 128 procs per node and 1 thread per MPI task
#-----------------------------------------------------------------
#SBATCH --job-name=mpijob # Job name
#SBATCH --output=mpijob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=mpijob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=256 # Total number of tasks
#SBATCH --nodes=2 # Total number of nodes requested
#SBATCH --ntasks-per-node=128 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task(=1) for pure MPI
#SBATCH --mem=128000 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=compute # Submit queue
#SBATCH -A testproj # Accounting project
# Load any necessary modules
module purge # Clean environment from loaded modules
module load gnu/13.3.0
module load openmpi/4.1.8/gnu
# Launch the executable
srun EXE ARGS
Hybrid MPI/OpenMP batch script¶
Launch MPI jobs with srun command
DON’T USE mpirun AND mpiexec
#!/bin/bash -l
#-----------------------------------------------------------------
# Hybrid MPI/OpenMP job , using 256 procs on 2 nodes ,
# with 128 procs per node and 2 threads per MPI task.
#-----------------------------------------------------------------
#SBATCH --job-name=hybridjob # Job name
#SBATCH --output=hybridjob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=hybridjob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=128 # Total number of tasks
#SBATCH --nodes=2 # Total number of nodes requested
#SBATCH --ntasks-per-node=64 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=56000 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=compute # Submit queue
#SBATCH -A testproj # Accounting project
if [ x$SLURM_CPUS_PER_TASK == x ]; then
export OMP_NUM_THREADS=1
else
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi
# Load any necessary modules
module purge # Clean environment from loaded modules
module load gnu/13.3.0
module load openmpi/4.1.8/gnu
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Launch the executable
srun EXE ARGS
GPU batch script - 1x Α100¶
Use up to 32 CPU cores per GPU
Use up to 124 GBs of RAM per GPU
Launch GPU accelerated jobs.
#!/bin/bash -l
#-----------------------------------------------------------------
# GPU job
# with 1 gpu, 16 procs and 2 threads per MPI task.
#-----------------------------------------------------------------
#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=16 # Total number of tasks
#SBATCH --gres=gpu:a100:1 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=16 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=126976 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=gpu # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project
# Load any necessary modules
module purge # Clean environment from loaded modules
module load cuda/12.5.1
if [ x$SLURM_CPUS_PER_TASK == x ]; then
export OMP_NUM_THREADS=1
else
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi
# Launch the executable
srun EXE ARGS
GPU batch script - 2x Α100¶
Launch GPU accelerated jobs.
Use up to 32 CPU cores per GPU
Use up to 124 GBs of RAM per GPU
#!/bin/bash -l
#-----------------------------------------------------------------
# GPU job
# with 2 gpus, 32 procs and 2 threads per MPI task.
#-----------------------------------------------------------------
#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=32 # Total number of tasks
#SBATCH --gres=gpu:a100:2 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=32 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=253952 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=gpu # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project
# Load any necessary modules
module purge # Clean environment from loaded modules
module load cuda/12.5.1
if [ x$SLURM_CPUS_PER_TASK == x ]; then
export OMP_NUM_THREADS=1
else
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi
# Launch the executable
srun EXE ARGS
GPU MIG batch script - 1x 1g.10gb¶
Launch GPU accelerated jobs.
Use up to 4 CPU cores and 15.5 GBs of RAM for 1g.10gb
Never use 2x 1g.10gb instead of 1x 2g.20gb
Processes running on separate MIG GPUs are not able to communicate via NVLink
#!/bin/bash -l
#-----------------------------------------------------------------
# GPU job
# with 1 MIG gpus, 2 procs and 2 threads per MPI task.
#-----------------------------------------------------------------
#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=2 # Total number of tasks
#SBATCH --gres=gpu:1g.10gb:1 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=2 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=15872 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=mig # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project
# Load any necessary modules
module purge # Clean environment from loaded modules
module load cuda/12.5.1
if [ x$SLURM_CPUS_PER_TASK == x ]; then
export OMP_NUM_THREADS=1
else
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi
# Launch the executable
srun EXE ARGS
GPU MIG batch script - 1x 2g.20gb¶
Launch GPU accelerated jobs.
Use up to 8 CPU cores and 31 GBs of RAM for 2g.20gb
Never use 2x 2g.20gb instead of 1x 3g.40gb
Processes running on separate MIG GPUs are not able to communicate via NVLink
#!/bin/bash -l
#-----------------------------------------------------------------
# GPU job
# with 1 MIG gpu, 4 procs and 2 threads per MPI task.
#-----------------------------------------------------------------
#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=4 # Total number of tasks
#SBATCH --gres=gpu:2g.20gb:1 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=4 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=31744 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=mig # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project
# Load any necessary modules
module purge # Clean environment from loaded modules
module load cuda/12.5.1
if [ x$SLURM_CPUS_PER_TASK == x ]; then
export OMP_NUM_THREADS=1
else
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi
# Launch the executable
srun EXE ARGS
GPU MIG batch script - 1x 3g.40gb¶
Launch GPU accelerated jobs.
Use up to 16 CPU cores and 62 GBs of RAM for 3g.40gb
Never use 2x 3g.40gb instead of 1x a100
Processes running on separate MIG GPUs are not able to communicate via NVLink
#!/bin/bash -l
#-----------------------------------------------------------------
# GPU job
# with 1 MIG gpu, 16 procs and 1 thread per MPI task.
#-----------------------------------------------------------------
#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=16 # Total number of tasks
#SBATCH --gres=gpu:3g.40gb:1 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=16 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task
#SBATCH --mem=63488 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=mig # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project
# Load any necessary modules
module purge # Clean environment from loaded modules
module load cuda/12.5.1
if [ x$SLURM_CPUS_PER_TASK == x ]; then
export OMP_NUM_THREADS=1
else
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi
# Launch the executable
srun EXE ARGS
Multiple Serial batch script¶
Multiple sruns executed simultaneously from a single batch script.
Please not the wait at the end of the script, that ensures slurm will not exit before all tasks are completed
#!/bin/bash -l
#-----------------------------------------------------------------
# Multiple Serial job , 4 tasks , requesting 1 node, 3968 MB of memory per task
#-----------------------------------------------------------------
#SBATCH --job-name=multiple-seraljob# Job name
#SBATCH --output=multiple-serialjob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=multiple-serialjob.%j.err # Stderr (%j expands to jobId)
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks=4 # Total number of tasks
#SBATCH --ntasks-per-node=4 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task
#SBATCH --mem-per-cpu=3968 # Memory per task in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=compute # Submit queue
#SBATCH -A testproj # Accounting project
# Load any necessary modules
module load gnu/13
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Launch the executable a.out
srun -n 1 -c 1 ./a.out input0 &
srun -n 1 -c 1 ./a.out input1 &
srun -n 1 -c 1 ./a.out input2 &
srun -n 1 -c 1 ./a.out input3 &
wait