Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • S Shark CentOS Slurm User guide
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
    • Model experiments
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • SharkShark
  • Shark CentOS Slurm User guide
  • Wiki
  • Home

Home · Changes

Page history
Update home removed reference to res-hpc-lo02 authored Oct 19, 2021 by Villerius's avatar Villerius
Hide whitespace changes
Inline Side-by-side
home.md
View page @ e6e105fe
......@@ -32,8 +32,8 @@ General email: ITenDI_Infra-Linux@lumc.nl
| Hostname | IP address | CPU | Cores | Memory | GPUs | Purpose | Type machine |
| --- | --- | --- | --- | --- | --- | --- | --- |
| res-hpc-lo01 | 145.88.76.243 | Intel E5-2660 | 32 | 128Gb | 0 | Login node | Dell PowerEdge M620 |
| res-hpc-lo02 | 145.88.76.217 | Intel Xe 6248 | 80 | 128Gb | 1 | Login node + Rem Vis* | Dell PowerEdge R740 |
| res-hpc-lo04 | 145.88.76.219 | Intel 8280 | 8 | 32Gb | 0 | Login node | VM |
| res-hpc-exe007 | 145.88.76.220 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
| res-hpc-exe008 | 145.88.76.224 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
| res-hpc-exe009 | 145.88.76.222 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
......@@ -92,7 +92,7 @@ General email: ITenDI_Infra-Linux@lumc.nl
## Rules
- Always login/connect to the login node **res-hpc-lo01.researchlumc.nl** or **res-hpc-lo02.researchlumc.nl**
- Always login/connect to the login node **res-hpc-lo02.researchlumc.nl** or **res-hpc-lo04.researchlumc.nl**
- Always use the workload manager (Slurm) to run/submit jobs or use it interactive
- **Never** run a job outside the workload manager (Slurm), not on the login node nor on the execution nodes
- **Never** run (heavy) calculations on the login node, but do this on a compute node
......@@ -135,47 +135,48 @@ Other distributions:
#### From the command line (ssh)
If your login user name from your Linux workstation is the same as the username on the HPC cluster, you can use:
- ssh res-hpc-lo01.researchlumc.nl
- ssh res-hpc-lo02.researchlumc.nl
or
- ssh res-hpc-lo02.researchlumc.nl
- ssh res-hpc-lo04.researchlumc.nl
Otherwise:
- ssh username@res-hpc-lo01.researchlumc.nl
- ssh username@res-hpc-lo02.researchlumc.nl
or
- ssh username@res-hpc-lo02.researchlumc.nl
- ssh username@res-hpc-lo04.researchlumc.nl
You can make your life easier by editing the file:
```
vi ~/.ssh/config
Host res-hpc-lo01
Hostname 145.88.76.243
User user-name
ServerAliveInterval 60
Host res-hpc-lo02
Hostname 145.88.76.217
User user-name
ServerAliveInterval 60
Host res-hpc-lo04
Hostname 145.88.76.219
User user-name
ServerAliveInterval 60
```
Where you adapt the *user-name*.
#### X11 forwarding
You can show graphical output when you enable X11 forwarding
- ssh -X res-hpc-lo01.researchlumc.nl
- ssh -X res-hpc-lo02.researchlumc.nl
or
- ssh -X res-hpc-lo02.researchlumc.nl
- ssh -X res-hpc-lo04.researchlumc.nl
or
- ssh -Y res-hpc-lo01.researchlumc.nl
- ssh -Y res-hpc-lo02.researchlumc.nl
or
- ssh -Y res-hpc-lo02.researchlumc.nl
- ssh -Y res-hpc-lo04.researchlumc.nl
Once you are logged in, you should be able to run a graphical program, for example:
- xterm
......@@ -203,11 +204,11 @@ Go to Session and create a new session:
![alt text](images/x2goclient-02.gif "x2goclient-02")
For the **Host**:
- res-hpc-lo01.researchlumc.nl
- res-hpc-lo02.researchlumc.nl
or
- res-hpc-lo02.researchlumc.nl
- res-hpc-lo04.researchlumc.nl
For the **Session type**:
- XFCE
......@@ -277,7 +278,7 @@ Once you have started the putty program, you will see:
![alt text](images/putty-01.gif "putty-01")
Fill in at the Host Name (or IP address): **res-hpc-lo01.researchlumc.nl** or **res-hpc-lo02.researchlumc.nl**
Fill in at the Host Name (or IP address): **res-hpc-lo02.researchlumc.nl** or **res-hpc-lo04.researchlumc.nl**
![alt text](images/putty-02.gif "putty-02")
......@@ -317,7 +318,7 @@ If you press on the putty symbol at the left corner of the terminal window, you
You can give your session a name so that you can save it and reuse it later.
Give you session a name, for example "res-hpc-lo01". **Save**
Give you session a name, for example "res-hpc-lo02". **Save**
![alt text](images/putty-09.gif "putty-09")
......@@ -540,7 +541,7 @@ Handy reference:
- ./hello
```
Hello world from process 000 out of 001, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 000 out of 001, processor name res-hpc-lo02.researchlumc.nl
```
Here you can see that we ran the program only on 1 core of the cpu. (which is the same as running: mpirun -np 1 ./hello) (np = number of processes to launch)
......@@ -550,22 +551,22 @@ To make use of the MPI capabilities of the program, we have to run the program w
- mpirun ./hello
```
Hello world from process 003 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 006 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 013 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 015 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 000 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 005 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 010 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 011 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 012 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 002 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 004 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 007 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 001 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 008 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 009 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 014 out of 016, processor name res-hpc-lo01.researchlumc.nl
Hello world from process 003 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 006 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 013 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 015 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 000 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 005 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 010 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 011 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 012 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 002 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 004 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 007 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 001 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 008 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 009 out of 016, processor name res-hpc-lo02.researchlumc.nl
Hello world from process 014 out of 016, processor name res-hpc-lo02.researchlumc.nl
```
Now the program is using all the cores of the local machine. (which is the same as: mpirun -np 16 ./hello)
......@@ -622,7 +623,7 @@ The following commands are useful:
- sinfo -N -l
```
[user@res-hpc-lo01 ~]$ sinfo
[user@res-hpc-lo02 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
all* up infinite 2 mix res-hpc-exe[014,019]
all* up infinite 1 alloc res-hpc-exe018
......@@ -630,7 +631,7 @@ all* up infinite 4 idle res-hpc-exe[013,015-017]
gpu up infinite 2 mix res-hpc-gpu[01-02]
highmem up infinite 1 mix res-hpc-mem01
[user@res-hpc-lo01 ~]$ sinfo -a
[user@res-hpc-lo02 ~]$ sinfo -a
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
all* up infinite 2 mix res-hpc-exe[014,019]
all* up infinite 1 alloc res-hpc-exe018
......@@ -642,7 +643,7 @@ LKEBgpu up infinite 1 down* res-hpc-lkeb03
LKEBgpu up infinite 2 mix res-hpc-lkeb[04-05]
LKEBgpu up infinite 1 idle res-hpc-lkeb01
[user@res-hpc-lo01 ~]$ sinfo -l
[user@res-hpc-lo02 ~]$ sinfo -l
Mon Mar 23 09:21:27 2020
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
all* up infinite 1-infinite no NO all 2 mixed res-hpc-exe[014,019]
......@@ -653,7 +654,7 @@ highmem up infinite 1-infinite no NO all 1 mixe
```
```
[user@res-hpc-lo01 ~]$ sinfo -l -N -a
[user@res-hpc-lo02 ~]$ sinfo -l -N -a
Mon Mar 23 09:34:14 2020
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
res-hpc-exe013 1 all* mixed 16 2:8:1 128800 0 1 (null) none
......@@ -684,15 +685,15 @@ With the following command, you can get information about your running jobs and
- squeue -l
```
[user@res-hpc-lo01 mpi-benchmarks]$ squeue
[user@res-hpc-lo02 mpi-benchmarks]$ squeue
JOBID PARTITION USER ST TIME NODES NODELIST(REASON)
258 all user R 0:03 2 res-hpc-exe[013-014]
[user@res-hpc-lo01 mpi-benchmarks]$ squeue -a
[user@res-hpc-lo02 mpi-benchmarks]$ squeue -a
JOBID PARTITION USER ST TIME NODES NODELIST(REASON)
258 all user R 0:06 2 res-hpc-exe[013-014]
[user@res-hpc-lo01 mpi-benchmarks]$ squeue -l
[user@res-hpc-lo02 mpi-benchmarks]$ squeue -l
Thu Jan 23 09:14:22 2020
JOBID PARTITION USER STATE TIME TIME_LIMIT NODES NODELIST(REASON)
258 all user RUNNING 0:12 30:00 2 res-hpc-exe[013-014]
......@@ -716,7 +717,7 @@ An explanation of some state follows:
With the slurm command **scontrol** you can get a more detailed overview of your running job, node hardware and partitions:
```
[user@res-hpc-lo01 ~]$ scontrol show job 260
[user@res-hpc-lo02 ~]$ scontrol show job 260
JobId=260 JobName=IMB
UserId=user(225812) GroupId=Domain Users(513) MCS_label=N/A
Priority=35603 Nice=0 Account=dnst-ict QOS=normal
......@@ -745,7 +746,7 @@ JobId=260 JobName=IMB
Power=
MailUser=user@gmail.com MailType=BEGIN,END,FAIL
[user@res-hpc-lo01 ~]$ scontrol show node res-hpc-exe014
[user@res-hpc-lo02 ~]$ scontrol show node res-hpc-exe014
NodeName=res-hpc-exe014 Arch=x86_64 CoresPerSocket=12
CPUAlloc=16 CPUTot=24 CPULoad=0.00
AvailableFeatures=(null)
......@@ -763,7 +764,7 @@ NodeName=res-hpc-exe014 Arch=x86_64 CoresPerSocket=12
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
[user@res-hpc-lo01 ~]$ scontrol show partition all
[user@res-hpc-lo02 ~]$ scontrol show partition all
PartitionName=all
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
......@@ -781,7 +782,7 @@ PartitionName=all
You can open an interactive session with the **salloc** command:
```
[user@res-hpc-lo01 ~]$ salloc -N1
[user@res-hpc-lo02 ~]$ salloc -N1
salloc: Granted job allocation 267
salloc: Waiting for resource configuration
salloc: Nodes res-hpc-exe013 are ready for job
......@@ -793,13 +794,13 @@ salloc: Nodes res-hpc-exe013 are ready for job
exit
salloc: Relinquishing job allocation 267
[user@res-hpc-lo01 ~]$
[user@res-hpc-lo02 ~]$
```
In the example above, we won't run a command so we ended up in the bash environment.
With **exit** we leave the environment and we release the node.
```
[user@res-hpc-lo01 ~]$ salloc -N1 mpirun ./hello1
[user@res-hpc-lo02 ~]$ salloc -N1 mpirun ./hello1
salloc: Granted job allocation 268
salloc: Waiting for resource configuration
salloc: Nodes res-hpc-exe013 are ready for job
......@@ -811,7 +812,7 @@ Here we allocated 1 node with one core and ran the openmpi compiled "hello1" pro
Now the same with 2 nodes, 16 cores on each machine:
```
[user@res-hpc-lo01 ~]$ salloc -N2 --ntasks-per-node=16 mpirun ./hello1
[user@res-hpc-lo02 ~]$ salloc -N2 --ntasks-per-node=16 mpirun ./hello1
salloc: Granted job allocation 270
salloc: Waiting for resource configuration
salloc: Nodes res-hpc-exe[013-014] are ready for job
......@@ -855,13 +856,13 @@ With the **srun** command you can also open an interactive session or you can ru
Interactive:
```
[user@res-hpc-lo01 ~]$ srun --pty bash
[user@res-hpc-lo02 ~]$ srun --pty bash
[user@res-hpc-exe013 ~]$ exit
exit
```
Running a program:
```
[user@res-hpc-lo01 ~]$ cat hello.sh
[user@res-hpc-lo02 ~]$ cat hello.sh
#!/bin/bash
#
......@@ -871,9 +872,9 @@ echo ""
echo "SLURM_JOB_NAME: $SLURM_JOB_NAME"
echo "SLURM_JOBID: " $SLURM_JOBID
[user@res-hpc-lo01 ~]$ chmod +x hello.sh
[user@res-hpc-lo02 ~]$ chmod +x hello.sh
[user@res-hpc-lo01 ~]$ srun -N1 hello.sh
[user@res-hpc-lo02 ~]$ srun -N1 hello.sh
Hello from res-hpc-exe013.researchlumc.nl
It is currently Thu Jan 23 12:35:18 CET 2020
......@@ -1787,7 +1788,7 @@ You can start to run R interactively, just as an exercise and test.
The recommended way is to run R in batch mode.
```
[username@res-hpc-lo01 ~]$ salloc -N1 -n1
[username@res-hpc-lo02 ~]$ salloc -N1 -n1
salloc: Pending job allocation 386499
salloc: job 386499 queued and waiting for resources
salloc: job 386499 has been allocated resources
......@@ -1842,7 +1843,7 @@ Rscript --vanilla HelloWorld.R
Submitted batch job 386860
[username@res-hpc-lo01 R]$ cat slurm.out
[username@res-hpc-lo02 R]$ cat slurm.out
[1] "Hello world!"
......@@ -1874,17 +1875,17 @@ Rscript driver.R
```
```
[username@res-hpc-lo01 R]$ sbatch run.slurm
[username@res-hpc-lo02 R]$ sbatch run.slurm
Submitted batch job 386568
[username@res-hpc-lo01 R]$ ls -l
[username@res-hpc-lo02 R]$ ls -l
total 78
-rw-r--r-- 1 username Domain Users 59 Jun 5 11:42 driver.R
-rw-r--r-- 1 username Domain Users 483 Jun 5 11:42 run.slurm
-rw-r--r-- 1 username Domain Users 0 Jun 5 11:43 slurm.err
-rw-r--r-- 1 username Domain Users 671 Jun 5 11:43 slurm.out
[username@res-hpc-lo01 R]$ cat slurm.out
[username@res-hpc-lo02 R]$ cat slurm.out
My sample from N(0,1) is:
[1] 0.32241013 -0.78250675 -0.28872991 0.12559634 -0.29176358 0.57962942
[7] -0.38277807 -0.21266343 0.86537064 1.06636737 0.96487417 0.31699518
......@@ -1898,7 +1899,7 @@ My sample from N(0,1) is:
```
```
[username@res-hpc-lo01 R]$ scontrol show job 386568
[username@res-hpc-lo02 R]$ scontrol show job 386568
JobId=386568 JobName=serialR
UserId=username(225812) GroupId=Domain Users(513) MCS_label=N/A
Priority=449759 Nice=0 Account=dnst-ict QOS=normal
......@@ -1988,7 +1989,7 @@ mpirun Rscript hello.R
```
```
[username@res-hpc-lo01 R]$ cat slurm-rmpi.out
[username@res-hpc-lo02 R]$ cat slurm-rmpi.out
Hello world from process 000 of 008, on host res-hpc-gpu01
Hello world from process 001 of 008, on host res-hpc-gpu01
Hello world from process 002 of 008, on host res-hpc-gpu01
......@@ -2031,7 +2032,7 @@ rstudio
#### RStudio on a compute node
You can also start RStudio on a compute node:
```
[username@res-hpc-lo01 ~]$ srun --x11 --pty bash
[username@res-hpc-lo02 ~]$ srun --x11 --pty bash
[username@res-hpc-exe014 ~]$ module purge
[username@res-hpc-exe014 ~]$ module add statistical/RStudio/1.3.959/gcc-8.3.1
[username@res-hpc-exe014 ~]$ rstudio
......@@ -2166,11 +2167,11 @@ Also creating executable in /exports/example/projects/Project-A/bin/python
Installing setuptools, pip, wheel...done.
[username@res-hpc-lo01 ~]$ source /exports/example/projects/Project-A/bin/activate
(Project-A) [username@res-hpc-lo01 ~]$
[username@res-hpc-lo02 ~]$ source /exports/example/projects/Project-A/bin/activate
(Project-A) [username@res-hpc-lo02 ~]$
(Project-A) [username@res-hpc-lo01 python3.6]$ pip3 list
(Project-A) [username@res-hpc-lo02 python3.6]$ pip3 list
Package Version
---------- -------
pip 20.1.1
......@@ -2178,8 +2179,8 @@ setuptools 49.1.0
wheel 0.34.2
(Project-A) [username@res-hpc-lo01 ~]$ deactivate
[username@res-hpc-lo01 ~]$
(Project-A) [username@res-hpc-lo02 ~]$ deactivate
[username@res-hpc-lo02 ~]$
```
To remove your Python virtual environment delete the virtual environment directory.
......@@ -2313,8 +2314,8 @@ modified /home/username/.bashrc
```
```
[username@res-hpc-lo01 ~]$ conda activate Project-B
(Project-B) [username@res-hpc-lo01 ~]$
[username@res-hpc-lo02 ~]$ conda activate Project-B
(Project-B) [username@res-hpc-lo02 ~]$
```
```
......@@ -2463,8 +2464,8 @@ Executing transaction: done
```
```
(Project-B) [username@res-hpc-lo01 ~]$ conda deactivate
[username@res-hpc-lo01 ~]$
(Project-B) [username@res-hpc-lo02 ~]$ conda deactivate
[username@res-hpc-lo02 ~]$
```
```
......
Clone repository
  • User Guide
  • Home