Currently a potential user must ask for an account to the Ibisco reference colleague of her/his instituion, giving some identification data. The reference colleague sends the data to the Ibisc administrators: they send back tha access data with a temporary password.
ATTENTION: the TEMPORARY password must be changed at the first access
To change the password from command line use the “yppasswd” command. Yppasswd creates or changes a password valid on every resource of the cluster (not only on the front-end server) (Network password in a Network Information Service - NIS).
The login procedure will change lightly in a few months, see ahead Access Procedure.
To access the system (in particular its front-end or UI - User Interface) an user needs to connect via SSH protocol to the host ibiscohpc-ui.scope.unina.it. Access is currently only in non-graphical terminal emulation mode. However the account is valid for all cluster resources.
Currently the access is made via the SSH technique “user-password”, as shown below
Access example from unix-like systems:
$ ssh ibiscohpc-ui.scope.unina.it -l <USERNAME>
To access Ibisco from Windows systems a simple software is PuTTY, freely available at https://www.putty.org/
. From Windows 10 onwards it is also possible to use Openssh in a command window (CMD.exe o Powershell.exe). It is pre-installed (if it is not activated, it simply has to be activated in the Optional Features).
In a few months the access to the cluster will be exclusively via the “user-SSH Key” method (other secure access methods are being studied).
The current users are invited to generate their key pairs and upload the public key on the server in their home.
The new users, when asking for an account, will follow a lightly different procedure: they will generate the keys pair but will not upload the public key to the server (they will not have yet access): they will send it to the Ibisco admin. The admin will copy it, with the right permissions, in the home of the new user. After that the user will have the ability to enter the system without digiting a server password (but still he/she will have to digit a passphrase, see ahead).
Once inside the user will create a server password with yppasswd valid for access all the nodes of the cluster.
Obviously, it is important to keep in a secret and safe place the private key and the passphrase, otherwise, as in all safety problem, all the advantages brought by safer access algorithms will vanish.
Here we show a possible way to generate the key pair in linux and in windows. Anyway there is a lot of documentation in internet about how to do.
on a linux system
from your home directory execute
$ ssh-keygen -t rsa
Press enter to first question (filename)
In response to the prompt “Enter passphrase”, enter a key passphrase to protect access to the private key. Using a passphrase enhances security, and a passphrase is recommended.
The key pair is generated by the system.
$ ssh-copy-id -i ~/.ssh/id_rsa.pub <username>@ibiscohpc-wiki.scope.unina.it
on a Windows system
We suggest PuTTY, a package for Windows that simplifies the use of Windows as SSH client and the management of the connections to remote hosts
To create the key pair (https://the.earth.li/~sgtatham/putty/0.77/htmldoc/) you can follow the following procedure.
Users of the resource currently have the ability to use the following file systems
/lustre/home/
file system shared between nodes and UI created using Lustre technology where users' homes reside
/lustre/scratch
file system shared between nodes created using Lustre technology to be used as a scratch area
/home/scratch
file system local to each node to be used as a scratch area
ATTENTION: /lustre/scratch
and /home/scratch
are ONLY accessible from the nodes (i.e. when one of them is accessed), not from the UI
In-depth documentation on Lustre is available online, at the link: https://www.lustre.org/
/ibiscostorage
new scratch area shared among UI and computation nodes (available from 07/10/2022), not LUSTRE based
To improve the use of resources, the job management rules have been changed.
* New usage policies based on fairshare mechanisms have been implemented
* New queues for job submissions have been defined
From 9 October the current queue will be disabled and only those defined here will be active, to be explicitly selected. For example, to subdue a job in the parallel queue, execute
$ srun -p parallel <MORE OPTIONS> <COMMAND NAME>
If the job does not comply with the rules of the queue used, it will be terminated.
In the system is installed the resource manager SLURM to manage the cluster resources.
Complete documentation is avalailable at https://slurm.schedmd.com/
.
SLURM is an open source software sytstem for cluster management; it is highly scalable and integrates fault-tolerance and job scheduling mechanisms.
The main components fo SLURM are:
Partitions can be thought as job queues each of which defines constraints on job size, time limits, resource usage permissions by users, etc.
SLURM allows a centralized management through a daemon, slurmctld, to monitor resources and jobs. Each node is managed by a daemon, slurmctld, which takes care of handling requests for activity
Some tools available to the user are:
A complete list of available commands is in man (available also online at https://slurm.schedmd.com/man_index.html
): man <cmd>
sinfo
- Know and verify resources status (existing partitions and relating nodes, …) and system general status:
Example: $ sinfo
Output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST hpc* up infinite 32 idle ibiscohpc-wn[01-32]
Output shows partitions information; in this example:
ibiscohpc-wn01
, ibiscohpc-wn02
, …, ibiscohpc-wn32
.
squeue
- Know jons queue status:
Example: $ squeue
Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 4815 hpc sleep cnr-isas R 0:04
Output shows, for each job:
scontrol
- detailed information about job and resources
Example (detailed information about ibiscohpc-wn02
node)
$ scontrol show node ibiscohpc-wn02
Output:
NodeName=ibiscohpc-wn02 Arch=x86_64 CoresPerSocket=24 CPUAlloc=0 CPUTot=96 CPULoad=0.01 AvailableFeatures=HyperThread ActiveFeatures=HyperThread Gres=gpu:tesla:4(S:0) NodeAddr=ibiscohpc-wn02 NodeHostName=ibiscohpc-wn02 Version=20.11.5 OS=Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018 RealMemory=1546503 AllocMem=0 FreeMem=1528903 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=hpc BootTime=2022-02-01T16:24:43 SlurmdStartTime=2022-02-01T16:25:25 CfgTRES=cpu=96,mem=1546503M,billing=96 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Comment=(null)
srun
- manage a parallel job execution on the cluster managed by Slurm.
If necessary, srun allocates resources for job execution.
Some useful srun parameters are:
-c
, –cpus-per-task=<ncpus>
-l
, –label
-N
, –nodes=<minnodes>[-maxnodes]
minnodes
) of nodes to allocate for the job and the possible maximum one.-n
e -c
.PENDING
state. This allows for possible execution at a later time, when the partition limit is possibly changed.
-n
, –ntasks=<number>
srun
allocates the necessary resources based on the number of required tasks (by default, one node is required for each task but, using the –cpus-per-task
option, this behavior can be changed).Example, interactively access a node, from UI:
$ salloc srun --pty /bin/bash
Example, submit a batch job, from UI:
$ echo -e '#!/bin/sh\nhostname' | sbatch
Example, submit an MPI interactive job with <N> tasks, from UI:
$ srun -n <N> <EXEFILE>
Important command when using OpenMP
Add the following command in the script used to submit an OpenMP job:
$ export OMP_NUM_THREADS = <nthreads>
To use Intel's suite of compilers and libraries, you need to use (interactively or internally any script in which they are needed) the command
. /nfsexports/intel/oneapi/setvars.sh
For details about “Red Hat Developer Toolset” see https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/7/html/user_guide/chap-red_hat_developer_toolset
Here we report some examples showing how one can call the various development enviroments:
* Create a bash subshell in whichc the tools are working (in these case gcc/g++/gfortran/… v.10):
$ scl enable devtoolset-10 bash
# Make the tools operational (in this case gcc/g++/gfortran/… v.10, in the current shell):
$ source scl_source enable devtoolset-10
The following script is an example of how to use Singularity
#!/bin/bash singularity run library://godlovedc/funny/lolcow
To use python, it is necessary to start the conda environment using the following command,
source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh <commands execution> [Example: python example.py] conda deactivate
The tensorflow sub-environment activated after starting the conda environment
source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh conda activate tensorflowgpu <commands execution> [Example: python example.py] conda deactivate conda deactivate
To use bioconda sub-environment, the following command has to be executed.
source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh conda activate bioconda <commands execution> [Example: python example.py] conda deactivate conda deactivate
To use Pytorch sub-environment, the following commands have to be executed.
source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh conda activate pytorchenv <commands execution> [Example: python example.py] conda deactivate conda deactivate
To list the available packages in the given environment, run the command,
conda list
#!/bin/bash #SBATCH --nodes=[nnodes] #number of nodes #SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node #SBATCH --gres=gpu:[ngpu] #number of GPUs per node
Suppose a given python code has to be executed for different values of a variable “rep”. It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,… The python code example.py can have a line:
rep=REP
The submission script sub.sh can be used to parallelize the process in following way:
#!/bin/bash #SBATCH --nodes=[nnodes] #number of nodes #SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node #SBATCH --gres=gpu:[ngpu] #number of GPUs per node NPROC=[nprocesses] #number of processing units to be accessed tmpstring=tmp #temporary files generated count=0 #begin counting the temporary files for rep in {1..10}; #The value of rep should run from 1 to 10 do tmpprogram=${tmpstring}_${rep}.py #temporary file names for each of the values of rep sed -e "s/REP/$rep/g" #replace the variable REP in the .py with rep specified in the sub.sh file. $program > $tmpprogram #create the temporary files in parallel python $tmpprogram & #run the temporary files (( count++ )) #increase the count number [[ $(( count % NPROC )) -eq 0 ]] && wait #wait for the parallel programs to finish. done rm ${tmpstring}* #optionally remove the temporary files after the execution of all the temporary files
To use gmsh it is necessary to configure the execution environment (shell) in order to guarantee the availability of the necessary libraries by running the following command:
$ scl enable devtoolset-10 bash
Within the shell configured in this way, it is then possible to execute the gmsh command available in the directory
/nfsexports/SOFTWARE/gmsh-4.10.1-source/install/bin
When the ad-hoc configured shell is no longer needed, you can terminate with the command
$ exit
To use this version, available in the directory
/nfsexports/SOFTWARE/OpenFOAM-9.0/
you need to configure the environment as follows:
$ source /nfsexports/SOFTWARE/OpenFOAM-9.0/etc/bashrc
$ source /nfsexports/SOFTWARE/intel/oneapi/compiler/latest/env/vars.sh
ssh ibiscohpc-ui.scope.unina.it -l [username] -Y
when logging into the IBISCO cluster. /nfsexports/SOFTWARE/MATLAB/R2020b/bin/matlab
. This opens the mathworks command window where you will be able to add the settings file (see ahead). /nfsexports/SOFTWARE/MATLAB/R2022a/bin/matlab
.
Attached you will find an example of a *Profile File* that can be used to configure the multi-node parallel machine for the execution of Matlab parallel jobs on the IBiSco HPC cluster.
Example of configuration file for Parallel excution
The file must be decompressed before use
To be accessed by a Matlab program, the user *must first import* that file by starting the *Cluster Profile Manager* on the Matlab desktop (On the *Home* tab, in the *Environment* area, select *Parallel* > *Create and Manage Clusters*).
Figure 1: Parallel Configuration Window
In the *Create and Manage Clusters* select the *Import* option
Figure 2: Import Configuration Window
Once the profile was imported, it can be referenced by a Matlab parallel program using the profile name 'SlurmIBISCOHPC': i.e.
mypool=parpool('SlurmIBISCOHPC', ___, ___, ___, ___) ... delete(mypool);
To modify the 'SlurmIBISCOHPC' profile the user can use
This is an example of using parfor to parallelize the for loop (demonstrated at MathWorks). This example calculates the spectral radius of a matrix and converts a for-loop into a parfor-loop. Open a file named as test.m with the following code
mypool=parpool('SlurmIBISCOHPC', 5) % 5 is the number of workers n = 100; A = 200; a = zeros(n); parfor i = 1:n a(i) = max(abs(eig(rand(A)))); end delete(mypool); quit
To run this code, the following command executed on the UI can be used:
/nfsexports/SOFTWARE/MATLAB/R2020b/bin/matlab -nodisplay -nosplash -nodesktop -r test