Skip to content

Job Submission

Job Submission

In order to create a resource allocation and launch tasks you can submit a batch script.

A batch script, submitted to the scheduling system must specify the job specifications:

  1. resource queue , default is compute
  2. number of nodes required
  3. number of cores per node required
  4. maximum wall time for the job , (please notice the jobs exceeding wall time will be killed).

To submit a job, user can use the sbatch command.

sbatch my_script

Please check sbatch man for more information.

man sbatch

Define batch script

Batch scripts contain

  1. scheduler directives : lines begin with #SBATCH
  2. shell commands: UNIX shell (bash) commands
  3. job steps: created with the srun command
#!/bin/bash -l
#SBATCH --job-name=my_script    # Job name
#SBATCH --ntasks=2              # Number of tasks
#SBATCH --time=01:30:00         # Run time (hh:mm:ss) - 1.5 hours

module load gnu                 #load any needed modules

echo "Start at `date`"
cd $HOME/workdir
./a.out
echo "End at `date`"

To submit this batch script

sbatch my_script

Job Specifications

Option Argument Specification
--job-name, -J job_name Job name is job_name
--partition, -p queue_name Submits to queue queue_name
--account, -A project_name Project to charge compute hours
--ntasks, -n number_of_tasks Total number of tasks
--nodes, -N number_of_nodes Number of nodes
--ntasks-per-node ntasks_per_node Tasks per node
--cpus-per-task, -c ntasks_per_node Threads per task
--time, -t HH:MM:SS Time limit (hh:mm:ss)
--mem memory_mb Total memory requirements (MB)
--mem-per-cpu memory_mb Memory per task (MB)
--output, -o stdout_filename Direct job satndard output to stdout_filename, (%j expands to jobID)
--error, -e stderr_filename Direct job error to error_file, (%j expands to jobID)
--depend, -d afterok:jobid Job dependency

SLURM Environment Variables

SLURM provides environment variables for most of the values used in the #SBATCH directives.

Evironment Variable Description
$SLURM_JOBID Job id
$SLURM_JOB_NAME Job name
$SLURM_SUBMIT_DIR Submit directory
$SLURM_SUBMIT_HOST Submit host
$SLURM_JOB_NODELIST Node list
$SLURM_JOB_NUM_NODES Number of nodes
$SLURM_CPUS_ON_NODE Number of cores/node
$SLURM_CPUS_PER_TASK Threads per task
$SLURM_NTASKS_PER_NODE Number of tasks per node
#!/bin/bash -l
#SBATCH --job-name=slurm_env
#SBATCH --nodes=2                # 2 nodes
#SBATCH --ntasks-per-node=12     # Number of tasks to be invoked on each node
#SBATCH --mem-per-cpu=1024       # Minimum memory required per CPU (in megabytes)
#SBATCH --time=00:01:00          # Run time in hh:mm:ss
#SBATCH --error=job.%J.out
#SBATCH --output=job.%J.out

echo "Start at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS_PER_NODE tasks per node"
echo "Job id is $SLURM_JOBID"
echo "End at `date`"

Job Scripts

Here are some sample job submission scripts for different runtime models.

  • MPI job: Run multi-process programs with MPI.
  • Hybrid job: Parallel programs with MPI and OpenMP threads.
  • GPU job: Utilize GPU accelerators.

Pure MPI batch script

Launch MPI jobs with srun command

DON’T USE mpirun AND mpiexec

#!/bin/bash -l

#-----------------------------------------------------------------
# Pure MPI job , using 256 procs on 2 nodes ,
# with 128 procs per node and 1 thread per MPI task

#-----------------------------------------------------------------

#SBATCH --job-name=mpijob # Job name
#SBATCH --output=mpijob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=mpijob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=256 # Total number of tasks
#SBATCH --nodes=2 # Total number of nodes requested
#SBATCH --ntasks-per-node=128 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task(=1) for pure MPI
#SBATCH --mem=128000 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=compute # Submit queue
#SBATCH -A testproj # Accounting project


# Load any necessary modules

module purge    # Clean environment from loaded modules
module load gnu/13.3.0
module load openmpi/4.1.8/gnu

# Launch the executable

srun EXE ARGS

Hybrid MPI/OpenMP batch script

Launch MPI jobs with srun command

DON’T USE mpirun AND mpiexec

#!/bin/bash -l

#-----------------------------------------------------------------
# Hybrid MPI/OpenMP job , using 256 procs on 2 nodes ,
# with 128 procs per node and 2 threads per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=hybridjob # Job name
#SBATCH --output=hybridjob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=hybridjob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=128 # Total number of tasks
#SBATCH --nodes=2 # Total number of nodes requested
#SBATCH --ntasks-per-node=64 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=56000 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=compute # Submit queue
#SBATCH -A testproj # Accounting project

if [ x$SLURM_CPUS_PER_TASK == x ]; then
  export OMP_NUM_THREADS=1
else
  export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi

# Load any necessary modules

module purge    # Clean environment from loaded modules
module load gnu/13.3.0
module load openmpi/4.1.8/gnu

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Launch the executable
srun EXE ARGS

GPU batch script - 1x Α100

Use up to 32 CPU cores per GPU

Use up to 124 GBs of RAM per GPU

Launch GPU accelerated jobs.

#!/bin/bash -l

#-----------------------------------------------------------------
# GPU job
# with 1 gpu, 16 procs and 2 threads per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=16 # Total number of tasks
#SBATCH --gres=gpu:a100:1 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=16 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=126976 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=gpu # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module purge    # Clean environment from loaded modules
module load cuda/12.5.1

if [ x$SLURM_CPUS_PER_TASK == x ]; then
  export OMP_NUM_THREADS=1
else
  export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi

# Launch the executable
srun EXE ARGS

GPU batch script - 2x Α100

Launch GPU accelerated jobs.

Use up to 32 CPU cores per GPU

Use up to 124 GBs of RAM per GPU

#!/bin/bash -l

#-----------------------------------------------------------------
# GPU job
# with 2 gpus, 32 procs and 2 threads per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=32 # Total number of tasks
#SBATCH --gres=gpu:a100:2 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=32 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=253952 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=gpu # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module purge    # Clean environment from loaded modules
module load cuda/12.5.1

if [ x$SLURM_CPUS_PER_TASK == x ]; then
  export OMP_NUM_THREADS=1
else
  export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi

# Launch the executable
srun EXE ARGS

GPU MIG batch script - 1x 1g.10gb

Launch GPU accelerated jobs.

Use up to 4 CPU cores and 15.5 GBs of RAM for 1g.10gb

Never use 2x 1g.10gb instead of 1x 2g.20gb

Processes running on separate MIG GPUs are not able to communicate via NVLink

#!/bin/bash -l

#-----------------------------------------------------------------
# GPU job
# with 1 MIG gpus, 2 procs and 2 threads per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=2 # Total number of tasks
#SBATCH --gres=gpu:1g.10gb:1 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=2 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=15872 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=mig # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module purge    # Clean environment from loaded modules
module load cuda/12.5.1

if [ x$SLURM_CPUS_PER_TASK == x ]; then
  export OMP_NUM_THREADS=1
else
  export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi

# Launch the executable
srun EXE ARGS

GPU MIG batch script - 1x 2g.20gb

Launch GPU accelerated jobs.

Use up to 8 CPU cores and 31 GBs of RAM for 2g.20gb

Never use 2x 2g.20gb instead of 1x 3g.40gb

Processes running on separate MIG GPUs are not able to communicate via NVLink

#!/bin/bash -l

#-----------------------------------------------------------------
# GPU job
# with 1 MIG gpu, 4 procs and 2 threads per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=4 # Total number of tasks
#SBATCH --gres=gpu:2g.20gb:1 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=4 # Tasks per node
#SBATCH --cpus-per-task=2 # Threads per task
#SBATCH --mem=31744 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=mig # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module purge    # Clean environment from loaded modules
module load cuda/12.5.1

if [ x$SLURM_CPUS_PER_TASK == x ]; then
  export OMP_NUM_THREADS=1
else
  export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi

# Launch the executable
srun EXE ARGS

GPU MIG batch script - 1x 3g.40gb

Launch GPU accelerated jobs.

Use up to 16 CPU cores and 62 GBs of RAM for 3g.40gb

Never use 2x 3g.40gb instead of 1x a100

Processes running on separate MIG GPUs are not able to communicate via NVLink

#!/bin/bash -l

#-----------------------------------------------------------------
# GPU job
# with 1 MIG gpu, 16 procs and 1 thread per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=16 # Total number of tasks
#SBATCH --gres=gpu:3g.40gb:1 # GPUs per node
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=16 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task
#SBATCH --mem=63488 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=mig # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module purge    # Clean environment from loaded modules
module load cuda/12.5.1

if [ x$SLURM_CPUS_PER_TASK == x ]; then
  export OMP_NUM_THREADS=1
else
  export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
fi

# Launch the executable
srun EXE ARGS

Multiple Serial batch script

Multiple sruns executed simultaneously from a single batch script.

Please not the wait at the end of the script, that ensures slurm will not exit before all tasks are completed

#!/bin/bash -l

#-----------------------------------------------------------------
# Multiple Serial job , 4 tasks , requesting 1 node, 3968 MB of memory per task 
#-----------------------------------------------------------------

#SBATCH --job-name=multiple-seraljob# Job name
#SBATCH --output=multiple-serialjob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=multiple-serialjob.%j.err # Stderr (%j expands to jobId)
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks=4 # Total number of tasks
#SBATCH --ntasks-per-node=4 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task
#SBATCH --mem-per-cpu=3968 # Memory per task in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=compute # Submit queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules
module load gnu/13

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Launch the executable a.out

srun -n 1 -c 1 ./a.out input0 &
srun -n 1 -c 1 ./a.out input1 &
srun -n 1 -c 1 ./a.out input2 &
srun -n 1 -c 1 ./a.out input3 &
wait