|
|
# SLURM commands
|
|
|
There is a lot of information available on the [SLURM website](https://slurm.schedmd.com/) or by using the ```man``` command.
|
|
|
The most common commands are:
|
|
|
* [srun](https://slurm.schedmd.com/srun.html)
|
|
|
* [sbatch](https://slurm.schedmd.com/sbatch.html)
|
|
|
* salloc
|
|
|
* [squeue](https://slurm.schedmd.com/squeue.html)
|
|
|
* [sstat](https://slurm.schedmd.com/sstat.html)
|
|
|
* [sacct](https://slurm.schedmd.com/sacct.html)
|
|
|
* scontrol
|
|
|
* sview
|
|
|
* sdiag
|
|
|
# Batch Queuing Commands
|
|
|
|
|
|
## Interactive Session
|
|
|
To start an interactive job issue the command:
|
|
|
```
|
|
|
srun <resources> --pty /bin/bash
|
|
|
```
|
|
|
### resources
|
|
|
The resources are optional, if you need more then the default you should set them.
|
|
|
The default resources you get: 1 core, 2GB memory and 1 hour run time
|
|
|
* --time=
|
|
|
* Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
|
|
|
* --mem=
|
|
|
* Specify the real memory required per node (OGS uses mem/core). Different units can be specified using the suffix [K|M|G|T]
|
|
|
* --ntasks-per-node=
|
|
|
* Request number of cores per node
|
|
|
To submit a singe node job with 2 cores and 4GB memory and 120 minutes run time :
|
|
|
```
|
|
|
srun --ntasks=1 --cpus-per-task=2 --mem=4gb -t 90 --pty /bin/bash
|
|
|
or
|
|
|
srun --ntasks-per-node=2 --mem=4gb -t 90 --pty /bin/bash
|
|
|
```
|
|
|
## Submit a Job
|
|
|
To submit a job to the cluster issue the command :
|
|
|
```
|
|
|
sbatch <resources> <script Name>
|
|
|
```
|
|
|
The resources are the same as the interactive session, for more resources options please read the manual.
|
|
|
The following commands give an overview of the most important OGS/SLURM commands please read the manual with the command : man sge-intro, man sbatch, man srun, man scontrol
|
|
|
A queue in Open Grid Scheduler is a partition in SLURM.
|
|
|
|
|
|
#Check status of the queue
|
|
|
The command to issue is
|
|
|
```
|
|
|
squeue
|
|
|
```
|
|
|
For a more informative output you can specify an output format like :
|
|
|
```
|
|
|
squeue -o "%.10A %.18u %.3t %.5C %.20S %.5D %.10a %.10M %.9P"
|
|
|
```
|
|
|
For a full list of all the options please read the [squeue manual](https://slurm.schedmd.com/squeue.html).
|
|
|
## Common job commands
|
|
|
|Explanation|Command OGS|Command SLURM|
|
|
|
|-------|-----------|---------|
|
|
|
|Cluster status| - |sinfo|
|
|
|
|Interactive login|qlogin|srun --pty bash|
|
|
|
|Submit a job to the queue|qsub|sbatch|
|
|
|
|Cancel a queued or running job|qdel|scancel [job-id]|
|
|
|
|Place a queued job on hold|qhold|scontrol hold [job-id]|
|
|
|
|Resume a hold job|qrls [job-ID]|scontrol release [job-ID]|
|
|
|
|Check the status of queued and running jobs|qstat|squeue|
|
|
|
|Detailed job status|qstat -j [job-ID]|scontrol show job [job-ID]|
|
|
|
|Queue information|qstat -g c|scontrol show partition|
|
|
|
|Queue details|qconf -sq [queue]|scontrol show partition [queue]|
|
|
|
|List nodes|qhost|scontrol show nodes|
|
|
|
|Node details|qhost -F [node]|scontrol show node [node]|
|
|
|
|Monitor / review job resources|qacct -j [job-ID]|sacct -j [job-ID]|
|
|
|
|
|
|
# Accounting for running jobs
|
|
|
The default command to use is:
|
|
|
```
|
|
|
sstat
|
|
|
```
|
|
|
Note : sstat -j <jobid> can only be used on jobs submitted with srun.
|
|
|
For sbatch submitted jobs use <jobid>.batch
|
|
|
To view your memory consumption for a sbatch job use:
|
|
|
```
|
|
|
sstat -j <JobID>.batch -o maxrss
|
|
|
```
|
|
|
To view all the available options you can display with -o option use the command:
|
|
|
```
|
|
|
sstat -e
|
|
|
```
|
|
|
Show running or completed job utilization from the accounting information with full details.
|
|
|
```
|
|
|
sacct -j <JobID> -l
|
|
|
## Job Submission options
|
|
|
|Option|OGS(qsub)|SLURM(sbatch)|
|
|
|
|-------|-----------|---------|
|
|
|
|Script directive|#$|#SBATCH|
|
|
|
|Job name|-N [name]|--job-name=[name]|
|
|
|
|st. output file|-o [file-path]|--output=[file-path]|
|
|
|
|st. error file|-e [file-path]|--error=[file-path]|
|
|
|
|combine stout sterr|-j yes|--output=[file-path]|
|
|
|
|Working directory|-cwd [directory-path]|--workdir=[directory-path]|
|
|
|
|Notification email|-m [event]|--mail-type=[event]|
|
|
|
|email address| -M [email-address]|--mail-user=[email-address]|
|
|
|
|copy environment|-V|--export=ALL|
|
|
|
|Node count|-|--nodes=[count]|
|
|
|
|Request specific nodes|-l hostname=[node]|--nodelist=[node_file]|
|
|
|
|Processor count per node|-pe [count]|--ntasks-per-node=[count]|
|
|
|
|Processor count per task|-|--cpus-per-task=[count]|
|
|
|
|Memory limit|-l h_vmem=[limit]|--mem=[limit]|
|
|
|
|Minimum memory per processor|-|--mem-per-cpu=[memory]|
|
|
|
|Wall time limit|-l h_rt=[seconds]|--time=[hh:mm:ss]|
|
|
|
|Queue|-q [queue] |--partition=[queue]|
|
|
|
|Request specific resource|-l resource=[value]|--gres=gpu:[count]|
|
|
|
|assign job to a project|-P [project-name]|--account=[project-name]|
|
|
|
|
|
|
## Job submission scripts example
|
|
|
OGS shell script example
|
|
|
```sh
|
|
|
#!/bin/bash
|
|
|
# Name the job
|
|
|
#$ -N OGS_example
|
|
|
# concatenate stout and sterr
|
|
|
#$ -j y
|
|
|
# set output file name
|
|
|
#$ -o example.output
|
|
|
# Current working directory
|
|
|
#$ -cwd
|
|
|
# mail at beginning and end of job
|
|
|
#$ -m be
|
|
|
# set your email address
|
|
|
#$ -M your@email.address
|
|
|
#Request 8 hours run time
|
|
|
#$ -l h_rt=8:0:0
|
|
|
#specify 4G memory
|
|
|
#$ -l mem=4G
|
|
|
#specify 2 cores
|
|
|
#$ -l pe BWA 2
|
|
|
|
|
|
echo "start job OGS `date`"
|
|
|
sleep 120
|
|
|
echo "Finished `date`"
|
|
|
```
|
|
|
SLURM shell script example
|
|
|
```sh
|
|
|
#!/bin/bash
|
|
|
#
|
|
|
#SBATCH -J slurm_example
|
|
|
#SBATCH -o example.output
|
|
|
# change to working directory default for SLURM
|
|
|
#SBATCH -D ./
|
|
|
# Mail all events
|
|
|
#SBATCH --mail-type=ALL
|
|
|
# set your email address
|
|
|
#SBATCH --mail-user your@email.address
|
|
|
# Request 8 hours run time
|
|
|
#SBATCH -t 8:0:0
|
|
|
#
|
|
|
#SBATCH --mem=4000
|
|
|
#specify 2 cores
|
|
|
#SBATCH --ntasks-per-node=2
|
|
|
|
|
|
echo "start job SLURM `date`"
|
|
|
sleep 120
|
|
|
echo "Finished `date`"
|
|
|
``` |