... | @@ -32,8 +32,8 @@ General email: ITenDI_Infra-Linux@lumc.nl |
... | @@ -32,8 +32,8 @@ General email: ITenDI_Infra-Linux@lumc.nl |
|
|
|
|
|
| Hostname | IP address | CPU | Cores | Memory | GPUs | Purpose | Type machine |
|
|
| Hostname | IP address | CPU | Cores | Memory | GPUs | Purpose | Type machine |
|
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
|
| --- | --- | --- | --- | --- | --- | --- | --- |
|
|
| res-hpc-lo01 | 145.88.76.243 | Intel E5-2660 | 32 | 128Gb | 0 | Login node | Dell PowerEdge M620 |
|
|
|
|
| res-hpc-lo02 | 145.88.76.217 | Intel Xe 6248 | 80 | 128Gb | 1 | Login node + Rem Vis* | Dell PowerEdge R740 |
|
|
| res-hpc-lo02 | 145.88.76.217 | Intel Xe 6248 | 80 | 128Gb | 1 | Login node + Rem Vis* | Dell PowerEdge R740 |
|
|
|
|
| res-hpc-lo04 | 145.88.76.219 | Intel 8280 | 8 | 32Gb | 0 | Login node | VM |
|
|
| res-hpc-exe007 | 145.88.76.220 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
|
|
| res-hpc-exe007 | 145.88.76.220 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
|
|
| res-hpc-exe008 | 145.88.76.224 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
|
|
| res-hpc-exe008 | 145.88.76.224 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
|
|
| res-hpc-exe009 | 145.88.76.222 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
|
|
| res-hpc-exe009 | 145.88.76.222 | Intel E5-2697 | 24 | 384Gb | 0 | Execution node | Dell PowerEdge M620 |
|
... | @@ -92,7 +92,7 @@ General email: ITenDI_Infra-Linux@lumc.nl |
... | @@ -92,7 +92,7 @@ General email: ITenDI_Infra-Linux@lumc.nl |
|
|
|
|
|
|
|
|
|
## Rules
|
|
## Rules
|
|
- Always login/connect to the login node **res-hpc-lo01.researchlumc.nl** or **res-hpc-lo02.researchlumc.nl**
|
|
- Always login/connect to the login node **res-hpc-lo02.researchlumc.nl** or **res-hpc-lo04.researchlumc.nl**
|
|
- Always use the workload manager (Slurm) to run/submit jobs or use it interactive
|
|
- Always use the workload manager (Slurm) to run/submit jobs or use it interactive
|
|
- **Never** run a job outside the workload manager (Slurm), not on the login node nor on the execution nodes
|
|
- **Never** run a job outside the workload manager (Slurm), not on the login node nor on the execution nodes
|
|
- **Never** run (heavy) calculations on the login node, but do this on a compute node
|
|
- **Never** run (heavy) calculations on the login node, but do this on a compute node
|
... | @@ -135,47 +135,48 @@ Other distributions: |
... | @@ -135,47 +135,48 @@ Other distributions: |
|
|
|
|
|
#### From the command line (ssh)
|
|
#### From the command line (ssh)
|
|
If your login user name from your Linux workstation is the same as the username on the HPC cluster, you can use:
|
|
If your login user name from your Linux workstation is the same as the username on the HPC cluster, you can use:
|
|
- ssh res-hpc-lo01.researchlumc.nl
|
|
- ssh res-hpc-lo02.researchlumc.nl
|
|
|
|
|
|
or
|
|
or
|
|
|
|
|
|
- ssh res-hpc-lo02.researchlumc.nl
|
|
- ssh res-hpc-lo04.researchlumc.nl
|
|
|
|
|
|
Otherwise:
|
|
Otherwise:
|
|
- ssh username@res-hpc-lo01.researchlumc.nl
|
|
- ssh username@res-hpc-lo02.researchlumc.nl
|
|
|
|
|
|
or
|
|
or
|
|
|
|
|
|
- ssh username@res-hpc-lo02.researchlumc.nl
|
|
- ssh username@res-hpc-lo04.researchlumc.nl
|
|
|
|
|
|
You can make your life easier by editing the file:
|
|
You can make your life easier by editing the file:
|
|
```
|
|
```
|
|
vi ~/.ssh/config
|
|
vi ~/.ssh/config
|
|
Host res-hpc-lo01
|
|
|
|
Hostname 145.88.76.243
|
|
|
|
User user-name
|
|
|
|
ServerAliveInterval 60
|
|
|
|
Host res-hpc-lo02
|
|
Host res-hpc-lo02
|
|
Hostname 145.88.76.217
|
|
Hostname 145.88.76.217
|
|
User user-name
|
|
User user-name
|
|
ServerAliveInterval 60
|
|
ServerAliveInterval 60
|
|
|
|
Host res-hpc-lo04
|
|
|
|
Hostname 145.88.76.219
|
|
|
|
User user-name
|
|
|
|
ServerAliveInterval 60
|
|
|
|
|
|
```
|
|
```
|
|
Where you adapt the *user-name*.
|
|
Where you adapt the *user-name*.
|
|
|
|
|
|
#### X11 forwarding
|
|
#### X11 forwarding
|
|
You can show graphical output when you enable X11 forwarding
|
|
You can show graphical output when you enable X11 forwarding
|
|
- ssh -X res-hpc-lo01.researchlumc.nl
|
|
- ssh -X res-hpc-lo02.researchlumc.nl
|
|
|
|
|
|
or
|
|
or
|
|
|
|
|
|
- ssh -X res-hpc-lo02.researchlumc.nl
|
|
- ssh -X res-hpc-lo04.researchlumc.nl
|
|
|
|
|
|
or
|
|
or
|
|
- ssh -Y res-hpc-lo01.researchlumc.nl
|
|
- ssh -Y res-hpc-lo02.researchlumc.nl
|
|
|
|
|
|
or
|
|
or
|
|
|
|
|
|
- ssh -Y res-hpc-lo02.researchlumc.nl
|
|
- ssh -Y res-hpc-lo04.researchlumc.nl
|
|
|
|
|
|
Once you are logged in, you should be able to run a graphical program, for example:
|
|
Once you are logged in, you should be able to run a graphical program, for example:
|
|
- xterm
|
|
- xterm
|
... | @@ -203,11 +204,11 @@ Go to Session and create a new session: |
... | @@ -203,11 +204,11 @@ Go to Session and create a new session: |
|
![alt text](images/x2goclient-02.gif "x2goclient-02")
|
|
![alt text](images/x2goclient-02.gif "x2goclient-02")
|
|
|
|
|
|
For the **Host**:
|
|
For the **Host**:
|
|
- res-hpc-lo01.researchlumc.nl
|
|
- res-hpc-lo02.researchlumc.nl
|
|
|
|
|
|
or
|
|
or
|
|
|
|
|
|
- res-hpc-lo02.researchlumc.nl
|
|
- res-hpc-lo04.researchlumc.nl
|
|
|
|
|
|
For the **Session type**:
|
|
For the **Session type**:
|
|
- XFCE
|
|
- XFCE
|
... | @@ -277,7 +278,7 @@ Once you have started the putty program, you will see: |
... | @@ -277,7 +278,7 @@ Once you have started the putty program, you will see: |
|
|
|
|
|
![alt text](images/putty-01.gif "putty-01")
|
|
![alt text](images/putty-01.gif "putty-01")
|
|
|
|
|
|
Fill in at the Host Name (or IP address): **res-hpc-lo01.researchlumc.nl** or **res-hpc-lo02.researchlumc.nl**
|
|
Fill in at the Host Name (or IP address): **res-hpc-lo02.researchlumc.nl** or **res-hpc-lo04.researchlumc.nl**
|
|
|
|
|
|
![alt text](images/putty-02.gif "putty-02")
|
|
![alt text](images/putty-02.gif "putty-02")
|
|
|
|
|
... | @@ -317,7 +318,7 @@ If you press on the putty symbol at the left corner of the terminal window, you |
... | @@ -317,7 +318,7 @@ If you press on the putty symbol at the left corner of the terminal window, you |
|
|
|
|
|
You can give your session a name so that you can save it and reuse it later.
|
|
You can give your session a name so that you can save it and reuse it later.
|
|
|
|
|
|
Give you session a name, for example "res-hpc-lo01". **Save**
|
|
Give you session a name, for example "res-hpc-lo02". **Save**
|
|
|
|
|
|
![alt text](images/putty-09.gif "putty-09")
|
|
![alt text](images/putty-09.gif "putty-09")
|
|
|
|
|
... | @@ -540,7 +541,7 @@ Handy reference: |
... | @@ -540,7 +541,7 @@ Handy reference: |
|
- ./hello
|
|
- ./hello
|
|
|
|
|
|
```
|
|
```
|
|
Hello world from process 000 out of 001, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 000 out of 001, processor name res-hpc-lo02.researchlumc.nl
|
|
```
|
|
```
|
|
|
|
|
|
Here you can see that we ran the program only on 1 core of the cpu. (which is the same as running: mpirun -np 1 ./hello) (np = number of processes to launch)
|
|
Here you can see that we ran the program only on 1 core of the cpu. (which is the same as running: mpirun -np 1 ./hello) (np = number of processes to launch)
|
... | @@ -550,22 +551,22 @@ To make use of the MPI capabilities of the program, we have to run the program w |
... | @@ -550,22 +551,22 @@ To make use of the MPI capabilities of the program, we have to run the program w |
|
- mpirun ./hello
|
|
- mpirun ./hello
|
|
|
|
|
|
```
|
|
```
|
|
Hello world from process 003 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 003 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 006 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 006 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 013 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 013 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 015 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 015 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 000 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 000 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 005 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 005 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 010 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 010 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 011 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 011 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 012 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 012 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 002 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 002 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 004 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 004 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 007 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 007 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 001 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 001 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 008 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 008 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 009 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 009 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
Hello world from process 014 out of 016, processor name res-hpc-lo01.researchlumc.nl
|
|
Hello world from process 014 out of 016, processor name res-hpc-lo02.researchlumc.nl
|
|
```
|
|
```
|
|
Now the program is using all the cores of the local machine. (which is the same as: mpirun -np 16 ./hello)
|
|
Now the program is using all the cores of the local machine. (which is the same as: mpirun -np 16 ./hello)
|
|
|
|
|
... | @@ -622,7 +623,7 @@ The following commands are useful: |
... | @@ -622,7 +623,7 @@ The following commands are useful: |
|
- sinfo -N -l
|
|
- sinfo -N -l
|
|
|
|
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 ~]$ sinfo
|
|
[user@res-hpc-lo02 ~]$ sinfo
|
|
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
|
|
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
|
|
all* up infinite 2 mix res-hpc-exe[014,019]
|
|
all* up infinite 2 mix res-hpc-exe[014,019]
|
|
all* up infinite 1 alloc res-hpc-exe018
|
|
all* up infinite 1 alloc res-hpc-exe018
|
... | @@ -630,7 +631,7 @@ all* up infinite 4 idle res-hpc-exe[013,015-017] |
... | @@ -630,7 +631,7 @@ all* up infinite 4 idle res-hpc-exe[013,015-017] |
|
gpu up infinite 2 mix res-hpc-gpu[01-02]
|
|
gpu up infinite 2 mix res-hpc-gpu[01-02]
|
|
highmem up infinite 1 mix res-hpc-mem01
|
|
highmem up infinite 1 mix res-hpc-mem01
|
|
|
|
|
|
[user@res-hpc-lo01 ~]$ sinfo -a
|
|
[user@res-hpc-lo02 ~]$ sinfo -a
|
|
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
|
|
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
|
|
all* up infinite 2 mix res-hpc-exe[014,019]
|
|
all* up infinite 2 mix res-hpc-exe[014,019]
|
|
all* up infinite 1 alloc res-hpc-exe018
|
|
all* up infinite 1 alloc res-hpc-exe018
|
... | @@ -642,7 +643,7 @@ LKEBgpu up infinite 1 down* res-hpc-lkeb03 |
... | @@ -642,7 +643,7 @@ LKEBgpu up infinite 1 down* res-hpc-lkeb03 |
|
LKEBgpu up infinite 2 mix res-hpc-lkeb[04-05]
|
|
LKEBgpu up infinite 2 mix res-hpc-lkeb[04-05]
|
|
LKEBgpu up infinite 1 idle res-hpc-lkeb01
|
|
LKEBgpu up infinite 1 idle res-hpc-lkeb01
|
|
|
|
|
|
[user@res-hpc-lo01 ~]$ sinfo -l
|
|
[user@res-hpc-lo02 ~]$ sinfo -l
|
|
Mon Mar 23 09:21:27 2020
|
|
Mon Mar 23 09:21:27 2020
|
|
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
|
|
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
|
|
all* up infinite 1-infinite no NO all 2 mixed res-hpc-exe[014,019]
|
|
all* up infinite 1-infinite no NO all 2 mixed res-hpc-exe[014,019]
|
... | @@ -653,7 +654,7 @@ highmem up infinite 1-infinite no NO all 1 mixe |
... | @@ -653,7 +654,7 @@ highmem up infinite 1-infinite no NO all 1 mixe |
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 ~]$ sinfo -l -N -a
|
|
[user@res-hpc-lo02 ~]$ sinfo -l -N -a
|
|
Mon Mar 23 09:34:14 2020
|
|
Mon Mar 23 09:34:14 2020
|
|
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
|
|
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
|
|
res-hpc-exe013 1 all* mixed 16 2:8:1 128800 0 1 (null) none
|
|
res-hpc-exe013 1 all* mixed 16 2:8:1 128800 0 1 (null) none
|
... | @@ -684,15 +685,15 @@ With the following command, you can get information about your running jobs and |
... | @@ -684,15 +685,15 @@ With the following command, you can get information about your running jobs and |
|
- squeue -l
|
|
- squeue -l
|
|
|
|
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 mpi-benchmarks]$ squeue
|
|
[user@res-hpc-lo02 mpi-benchmarks]$ squeue
|
|
JOBID PARTITION USER ST TIME NODES NODELIST(REASON)
|
|
JOBID PARTITION USER ST TIME NODES NODELIST(REASON)
|
|
258 all user R 0:03 2 res-hpc-exe[013-014]
|
|
258 all user R 0:03 2 res-hpc-exe[013-014]
|
|
|
|
|
|
[user@res-hpc-lo01 mpi-benchmarks]$ squeue -a
|
|
[user@res-hpc-lo02 mpi-benchmarks]$ squeue -a
|
|
JOBID PARTITION USER ST TIME NODES NODELIST(REASON)
|
|
JOBID PARTITION USER ST TIME NODES NODELIST(REASON)
|
|
258 all user R 0:06 2 res-hpc-exe[013-014]
|
|
258 all user R 0:06 2 res-hpc-exe[013-014]
|
|
|
|
|
|
[user@res-hpc-lo01 mpi-benchmarks]$ squeue -l
|
|
[user@res-hpc-lo02 mpi-benchmarks]$ squeue -l
|
|
Thu Jan 23 09:14:22 2020
|
|
Thu Jan 23 09:14:22 2020
|
|
JOBID PARTITION USER STATE TIME TIME_LIMIT NODES NODELIST(REASON)
|
|
JOBID PARTITION USER STATE TIME TIME_LIMIT NODES NODELIST(REASON)
|
|
258 all user RUNNING 0:12 30:00 2 res-hpc-exe[013-014]
|
|
258 all user RUNNING 0:12 30:00 2 res-hpc-exe[013-014]
|
... | @@ -716,7 +717,7 @@ An explanation of some state follows: |
... | @@ -716,7 +717,7 @@ An explanation of some state follows: |
|
With the slurm command **scontrol** you can get a more detailed overview of your running job, node hardware and partitions:
|
|
With the slurm command **scontrol** you can get a more detailed overview of your running job, node hardware and partitions:
|
|
|
|
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 ~]$ scontrol show job 260
|
|
[user@res-hpc-lo02 ~]$ scontrol show job 260
|
|
JobId=260 JobName=IMB
|
|
JobId=260 JobName=IMB
|
|
UserId=user(225812) GroupId=Domain Users(513) MCS_label=N/A
|
|
UserId=user(225812) GroupId=Domain Users(513) MCS_label=N/A
|
|
Priority=35603 Nice=0 Account=dnst-ict QOS=normal
|
|
Priority=35603 Nice=0 Account=dnst-ict QOS=normal
|
... | @@ -745,7 +746,7 @@ JobId=260 JobName=IMB |
... | @@ -745,7 +746,7 @@ JobId=260 JobName=IMB |
|
Power=
|
|
Power=
|
|
MailUser=user@gmail.com MailType=BEGIN,END,FAIL
|
|
MailUser=user@gmail.com MailType=BEGIN,END,FAIL
|
|
|
|
|
|
[user@res-hpc-lo01 ~]$ scontrol show node res-hpc-exe014
|
|
[user@res-hpc-lo02 ~]$ scontrol show node res-hpc-exe014
|
|
NodeName=res-hpc-exe014 Arch=x86_64 CoresPerSocket=12
|
|
NodeName=res-hpc-exe014 Arch=x86_64 CoresPerSocket=12
|
|
CPUAlloc=16 CPUTot=24 CPULoad=0.00
|
|
CPUAlloc=16 CPUTot=24 CPULoad=0.00
|
|
AvailableFeatures=(null)
|
|
AvailableFeatures=(null)
|
... | @@ -763,7 +764,7 @@ NodeName=res-hpc-exe014 Arch=x86_64 CoresPerSocket=12 |
... | @@ -763,7 +764,7 @@ NodeName=res-hpc-exe014 Arch=x86_64 CoresPerSocket=12 |
|
CurrentWatts=0 AveWatts=0
|
|
CurrentWatts=0 AveWatts=0
|
|
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
|
|
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
|
|
|
|
|
|
[user@res-hpc-lo01 ~]$ scontrol show partition all
|
|
[user@res-hpc-lo02 ~]$ scontrol show partition all
|
|
PartitionName=all
|
|
PartitionName=all
|
|
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
|
|
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
|
|
AllocNodes=ALL Default=YES QoS=N/A
|
|
AllocNodes=ALL Default=YES QoS=N/A
|
... | @@ -781,7 +782,7 @@ PartitionName=all |
... | @@ -781,7 +782,7 @@ PartitionName=all |
|
|
|
|
|
You can open an interactive session with the **salloc** command:
|
|
You can open an interactive session with the **salloc** command:
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 ~]$ salloc -N1
|
|
[user@res-hpc-lo02 ~]$ salloc -N1
|
|
salloc: Granted job allocation 267
|
|
salloc: Granted job allocation 267
|
|
salloc: Waiting for resource configuration
|
|
salloc: Waiting for resource configuration
|
|
salloc: Nodes res-hpc-exe013 are ready for job
|
|
salloc: Nodes res-hpc-exe013 are ready for job
|
... | @@ -793,13 +794,13 @@ salloc: Nodes res-hpc-exe013 are ready for job |
... | @@ -793,13 +794,13 @@ salloc: Nodes res-hpc-exe013 are ready for job |
|
exit
|
|
exit
|
|
salloc: Relinquishing job allocation 267
|
|
salloc: Relinquishing job allocation 267
|
|
|
|
|
|
[user@res-hpc-lo01 ~]$
|
|
[user@res-hpc-lo02 ~]$
|
|
```
|
|
```
|
|
In the example above, we won't run a command so we ended up in the bash environment.
|
|
In the example above, we won't run a command so we ended up in the bash environment.
|
|
With **exit** we leave the environment and we release the node.
|
|
With **exit** we leave the environment and we release the node.
|
|
|
|
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 ~]$ salloc -N1 mpirun ./hello1
|
|
[user@res-hpc-lo02 ~]$ salloc -N1 mpirun ./hello1
|
|
salloc: Granted job allocation 268
|
|
salloc: Granted job allocation 268
|
|
salloc: Waiting for resource configuration
|
|
salloc: Waiting for resource configuration
|
|
salloc: Nodes res-hpc-exe013 are ready for job
|
|
salloc: Nodes res-hpc-exe013 are ready for job
|
... | @@ -811,7 +812,7 @@ Here we allocated 1 node with one core and ran the openmpi compiled "hello1" pro |
... | @@ -811,7 +812,7 @@ Here we allocated 1 node with one core and ran the openmpi compiled "hello1" pro |
|
|
|
|
|
Now the same with 2 nodes, 16 cores on each machine:
|
|
Now the same with 2 nodes, 16 cores on each machine:
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 ~]$ salloc -N2 --ntasks-per-node=16 mpirun ./hello1
|
|
[user@res-hpc-lo02 ~]$ salloc -N2 --ntasks-per-node=16 mpirun ./hello1
|
|
salloc: Granted job allocation 270
|
|
salloc: Granted job allocation 270
|
|
salloc: Waiting for resource configuration
|
|
salloc: Waiting for resource configuration
|
|
salloc: Nodes res-hpc-exe[013-014] are ready for job
|
|
salloc: Nodes res-hpc-exe[013-014] are ready for job
|
... | @@ -855,13 +856,13 @@ With the **srun** command you can also open an interactive session or you can ru |
... | @@ -855,13 +856,13 @@ With the **srun** command you can also open an interactive session or you can ru |
|
|
|
|
|
Interactive:
|
|
Interactive:
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 ~]$ srun --pty bash
|
|
[user@res-hpc-lo02 ~]$ srun --pty bash
|
|
[user@res-hpc-exe013 ~]$ exit
|
|
[user@res-hpc-exe013 ~]$ exit
|
|
exit
|
|
exit
|
|
```
|
|
```
|
|
Running a program:
|
|
Running a program:
|
|
```
|
|
```
|
|
[user@res-hpc-lo01 ~]$ cat hello.sh
|
|
[user@res-hpc-lo02 ~]$ cat hello.sh
|
|
#!/bin/bash
|
|
#!/bin/bash
|
|
#
|
|
#
|
|
|
|
|
... | @@ -871,9 +872,9 @@ echo "" |
... | @@ -871,9 +872,9 @@ echo "" |
|
echo "SLURM_JOB_NAME: $SLURM_JOB_NAME"
|
|
echo "SLURM_JOB_NAME: $SLURM_JOB_NAME"
|
|
echo "SLURM_JOBID: " $SLURM_JOBID
|
|
echo "SLURM_JOBID: " $SLURM_JOBID
|
|
|
|
|
|
[user@res-hpc-lo01 ~]$ chmod +x hello.sh
|
|
[user@res-hpc-lo02 ~]$ chmod +x hello.sh
|
|
|
|
|
|
[user@res-hpc-lo01 ~]$ srun -N1 hello.sh
|
|
[user@res-hpc-lo02 ~]$ srun -N1 hello.sh
|
|
Hello from res-hpc-exe013.researchlumc.nl
|
|
Hello from res-hpc-exe013.researchlumc.nl
|
|
It is currently Thu Jan 23 12:35:18 CET 2020
|
|
It is currently Thu Jan 23 12:35:18 CET 2020
|
|
|
|
|
... | @@ -1787,7 +1788,7 @@ You can start to run R interactively, just as an exercise and test. |
... | @@ -1787,7 +1788,7 @@ You can start to run R interactively, just as an exercise and test. |
|
The recommended way is to run R in batch mode.
|
|
The recommended way is to run R in batch mode.
|
|
|
|
|
|
```
|
|
```
|
|
[username@res-hpc-lo01 ~]$ salloc -N1 -n1
|
|
[username@res-hpc-lo02 ~]$ salloc -N1 -n1
|
|
salloc: Pending job allocation 386499
|
|
salloc: Pending job allocation 386499
|
|
salloc: job 386499 queued and waiting for resources
|
|
salloc: job 386499 queued and waiting for resources
|
|
salloc: job 386499 has been allocated resources
|
|
salloc: job 386499 has been allocated resources
|
... | @@ -1842,7 +1843,7 @@ Rscript --vanilla HelloWorld.R |
... | @@ -1842,7 +1843,7 @@ Rscript --vanilla HelloWorld.R |
|
|
|
|
|
Submitted batch job 386860
|
|
Submitted batch job 386860
|
|
|
|
|
|
[username@res-hpc-lo01 R]$ cat slurm.out
|
|
[username@res-hpc-lo02 R]$ cat slurm.out
|
|
|
|
|
|
[1] "Hello world!"
|
|
[1] "Hello world!"
|
|
|
|
|
... | @@ -1874,17 +1875,17 @@ Rscript driver.R |
... | @@ -1874,17 +1875,17 @@ Rscript driver.R |
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```
|
|
[username@res-hpc-lo01 R]$ sbatch run.slurm
|
|
[username@res-hpc-lo02 R]$ sbatch run.slurm
|
|
Submitted batch job 386568
|
|
Submitted batch job 386568
|
|
|
|
|
|
[username@res-hpc-lo01 R]$ ls -l
|
|
[username@res-hpc-lo02 R]$ ls -l
|
|
total 78
|
|
total 78
|
|
-rw-r--r-- 1 username Domain Users 59 Jun 5 11:42 driver.R
|
|
-rw-r--r-- 1 username Domain Users 59 Jun 5 11:42 driver.R
|
|
-rw-r--r-- 1 username Domain Users 483 Jun 5 11:42 run.slurm
|
|
-rw-r--r-- 1 username Domain Users 483 Jun 5 11:42 run.slurm
|
|
-rw-r--r-- 1 username Domain Users 0 Jun 5 11:43 slurm.err
|
|
-rw-r--r-- 1 username Domain Users 0 Jun 5 11:43 slurm.err
|
|
-rw-r--r-- 1 username Domain Users 671 Jun 5 11:43 slurm.out
|
|
-rw-r--r-- 1 username Domain Users 671 Jun 5 11:43 slurm.out
|
|
|
|
|
|
[username@res-hpc-lo01 R]$ cat slurm.out
|
|
[username@res-hpc-lo02 R]$ cat slurm.out
|
|
My sample from N(0,1) is:
|
|
My sample from N(0,1) is:
|
|
[1] 0.32241013 -0.78250675 -0.28872991 0.12559634 -0.29176358 0.57962942
|
|
[1] 0.32241013 -0.78250675 -0.28872991 0.12559634 -0.29176358 0.57962942
|
|
[7] -0.38277807 -0.21266343 0.86537064 1.06636737 0.96487417 0.31699518
|
|
[7] -0.38277807 -0.21266343 0.86537064 1.06636737 0.96487417 0.31699518
|
... | @@ -1898,7 +1899,7 @@ My sample from N(0,1) is: |
... | @@ -1898,7 +1899,7 @@ My sample from N(0,1) is: |
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```
|
|
[username@res-hpc-lo01 R]$ scontrol show job 386568
|
|
[username@res-hpc-lo02 R]$ scontrol show job 386568
|
|
JobId=386568 JobName=serialR
|
|
JobId=386568 JobName=serialR
|
|
UserId=username(225812) GroupId=Domain Users(513) MCS_label=N/A
|
|
UserId=username(225812) GroupId=Domain Users(513) MCS_label=N/A
|
|
Priority=449759 Nice=0 Account=dnst-ict QOS=normal
|
|
Priority=449759 Nice=0 Account=dnst-ict QOS=normal
|
... | @@ -1988,7 +1989,7 @@ mpirun Rscript hello.R |
... | @@ -1988,7 +1989,7 @@ mpirun Rscript hello.R |
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```
|
|
[username@res-hpc-lo01 R]$ cat slurm-rmpi.out
|
|
[username@res-hpc-lo02 R]$ cat slurm-rmpi.out
|
|
Hello world from process 000 of 008, on host res-hpc-gpu01
|
|
Hello world from process 000 of 008, on host res-hpc-gpu01
|
|
Hello world from process 001 of 008, on host res-hpc-gpu01
|
|
Hello world from process 001 of 008, on host res-hpc-gpu01
|
|
Hello world from process 002 of 008, on host res-hpc-gpu01
|
|
Hello world from process 002 of 008, on host res-hpc-gpu01
|
... | @@ -2031,7 +2032,7 @@ rstudio |
... | @@ -2031,7 +2032,7 @@ rstudio |
|
#### RStudio on a compute node
|
|
#### RStudio on a compute node
|
|
You can also start RStudio on a compute node:
|
|
You can also start RStudio on a compute node:
|
|
```
|
|
```
|
|
[username@res-hpc-lo01 ~]$ srun --x11 --pty bash
|
|
[username@res-hpc-lo02 ~]$ srun --x11 --pty bash
|
|
[username@res-hpc-exe014 ~]$ module purge
|
|
[username@res-hpc-exe014 ~]$ module purge
|
|
[username@res-hpc-exe014 ~]$ module add statistical/RStudio/1.3.959/gcc-8.3.1
|
|
[username@res-hpc-exe014 ~]$ module add statistical/RStudio/1.3.959/gcc-8.3.1
|
|
[username@res-hpc-exe014 ~]$ rstudio
|
|
[username@res-hpc-exe014 ~]$ rstudio
|
... | @@ -2166,11 +2167,11 @@ Also creating executable in /exports/example/projects/Project-A/bin/python |
... | @@ -2166,11 +2167,11 @@ Also creating executable in /exports/example/projects/Project-A/bin/python |
|
Installing setuptools, pip, wheel...done.
|
|
Installing setuptools, pip, wheel...done.
|
|
|
|
|
|
|
|
|
|
[username@res-hpc-lo01 ~]$ source /exports/example/projects/Project-A/bin/activate
|
|
[username@res-hpc-lo02 ~]$ source /exports/example/projects/Project-A/bin/activate
|
|
(Project-A) [username@res-hpc-lo01 ~]$
|
|
(Project-A) [username@res-hpc-lo02 ~]$
|
|
|
|
|
|
|
|
|
|
(Project-A) [username@res-hpc-lo01 python3.6]$ pip3 list
|
|
(Project-A) [username@res-hpc-lo02 python3.6]$ pip3 list
|
|
Package Version
|
|
Package Version
|
|
---------- -------
|
|
---------- -------
|
|
pip 20.1.1
|
|
pip 20.1.1
|
... | @@ -2178,8 +2179,8 @@ setuptools 49.1.0 |
... | @@ -2178,8 +2179,8 @@ setuptools 49.1.0 |
|
wheel 0.34.2
|
|
wheel 0.34.2
|
|
|
|
|
|
|
|
|
|
(Project-A) [username@res-hpc-lo01 ~]$ deactivate
|
|
(Project-A) [username@res-hpc-lo02 ~]$ deactivate
|
|
[username@res-hpc-lo01 ~]$
|
|
[username@res-hpc-lo02 ~]$
|
|
```
|
|
```
|
|
|
|
|
|
To remove your Python virtual environment delete the virtual environment directory.
|
|
To remove your Python virtual environment delete the virtual environment directory.
|
... | @@ -2313,8 +2314,8 @@ modified /home/username/.bashrc |
... | @@ -2313,8 +2314,8 @@ modified /home/username/.bashrc |
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```
|
|
[username@res-hpc-lo01 ~]$ conda activate Project-B
|
|
[username@res-hpc-lo02 ~]$ conda activate Project-B
|
|
(Project-B) [username@res-hpc-lo01 ~]$
|
|
(Project-B) [username@res-hpc-lo02 ~]$
|
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```
|
... | @@ -2463,8 +2464,8 @@ Executing transaction: done |
... | @@ -2463,8 +2464,8 @@ Executing transaction: done |
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```
|
|
(Project-B) [username@res-hpc-lo01 ~]$ conda deactivate
|
|
(Project-B) [username@res-hpc-lo02 ~]$ conda deactivate
|
|
[username@res-hpc-lo01 ~]$
|
|
[username@res-hpc-lo02 ~]$
|
|
```
|
|
```
|
|
|
|
|
|
```
|
|
```
|
... | | ... | |