Table of Contents

Instructions and tips on using Ibisco software

Obtaining login credentials

Currently a potential user must ask for an account to the Ibisco reference colleague of her/his instituion, giving some identification data. The reference colleague sends the data to the Ibisc administrators: they send back tha access data with a temporary password.

ATTENTION: the TEMPORARY password must be changed at the first access

To change the password from command line use the “yppasswd” command. Yppasswd creates or changes a password valid on every resource of the cluster (not only on the front-end server) (Network password in a Network Information Service - NIS).

The login procedure will change lightly in a few months, see ahead Access Procedure.

Access procedure

To access the system (in particular its front-end or UI - User Interface) an user needs to connect via SSH protocol to the host ibiscohpc-ui.scope.unina.it. Access is currently only in non-graphical terminal emulation mode. However the account is valid for all cluster resources.

Currently the access is made via the SSH technique “user-password”, as shown below

Access example from unix-like systems:

$ ssh ibiscohpc-ui.scope.unina.it -l <USERNAME>

To access Ibisco from Windows systems a simple software is PuTTY, freely available at https://www.putty.org/. From Windows 10 onwards it is also possible to use Openssh in a command window (CMD.exe o Powershell.exe). It is pre-installed (if it is not activated, it simply has to be activated in the Optional Features).

In a few months the access to the cluster will be exclusively via the “user-SSH Key” method (other secure access methods are being studied).
The current users are invited to generate their key pairs and upload the public key on the server in their home.
The new users, when asking for an account, will follow a lightly different procedure: they will generate the keys pair but will not upload the public key to the server (they will not have yet access): they will send it to the Ibisco admin. The admin will copy it, with the right permissions, in the home of the new user. After that the user will have the ability to enter the system without digiting a server password (but still he/she will have to digit a passphrase, see ahead).
Once inside the user will create a server password with yppasswd valid for access all the nodes of the cluster.

Obviously, it is important to keep in a secret and safe place the private key and the passphrase, otherwise, as in all safety problem, all the advantages brought by safer access algorithms will vanish.

Here we show a possible way to generate the key pair in linux and in windows. Anyway there is a lot of documentation in internet about how to do.

on a linux system

from your home directory execute
$ ssh-keygen -t rsa
Press enter to first question (filename)
In response to the prompt “Enter passphrase”, enter a key passphrase to protect access to the private key. Using a passphrase enhances security, and a passphrase is recommended.
The key pair is generated by the system.

$ ssh-copy-id -i ~/.ssh/id_rsa.pub <username>@ibiscohpc-wiki.scope.unina.it

on a Windows system

We suggest PuTTY, a package for Windows that simplifies the use of Windows as SSH client and the management of the connections to remote hosts
To create the key pair (https://the.earth.li/~sgtatham/putty/0.77/htmldoc/) you can follow the following procedure.

  1. Run PUTTYGEN.EXE
  2. Leave the standard choiches (Key → SSH-2 RSA Key,Use probable primes, show fingerprint as SHA256; Parameters → RSA, 2048 bit)
  3. Press the “Generate” button and follow the indications
  4. when prompted for a passphrase, insert a good one and save it in a safe place
  5. Save in some safe directory or external usb device the private key (remember the path, needed to run a session with PuTTY)
  6. copy all the content of the box under “Public key for pasting …” (copy-paste) in a file id_rsa.pub. It will have the right format to be accepted by the OpenSSH (the SSH package available on linux and therefore also on IBiSco)
  7. send by mail the public key to the admins: as written before, they will provide for copying it in your .ssh directory on the cluster with the right permissions.

Available file systems

Users of the resource currently have the ability to use the following file systems

/lustre/home/ file system shared between nodes and UI created using Lustre technology where users' homes reside

/lustre/scratch file system shared between nodes created using Lustre technology to be used as a scratch area

/home/scratch file system local to each node to be used as a scratch area

ATTENTION: /lustre/scratch and /home/scratch are ONLY accessible from the nodes (i.e. when one of them is accessed), not from the UI

In-depth documentation on Lustre is available online, at the link: https://www.lustre.org/

/ibiscostorage new scratch area shared among UI and computation nodes (available from 07/10/2022), not LUSTRE based

Job preparation ans submission

Premise: new job management rules active from 9/10/2022

To improve the use of resources, the job management rules have been changed.

* New usage policies based on fairshare mechanisms have been implemented
* New queues for job submissions have been defined

  1. sequential queue:
    • accepts only sequential jobs with a number of tasks not exceeding 1,
    • who do not use GP-GPUs,
    • for a total number of jobs running on it not exceeding 128
    • and maximum execution time limit of 1 week
  2. parallel queue:
    • accepts only parallel jobs with task number greater than 1 and less than 1580,
    • that use no more 64 GP-GPUs
    • and maximum execution time limit of 1 week
  3. gpus queue:
    • only accepts jobs that use no more than 64 GP-GPUs,
    • with task number less than 1580
    • and maximum execution time limit of 1 week
  4. hparallel queue:
    • accepts only parallel jobs with task number greater than 1580 and less than 3160,
    • that make use of at least 64 GP-GPUs
    • and maximum execution time limit of 1 day

From 9 October the current queue will be disabled and only those defined here will be active, to be explicitly selected. For example, to subdue a job in the parallel queue, execute

$ srun -p parallel <MORE OPTIONS> <COMMAND NAME>

If the job does not comply with the rules of the queue used, it will be terminated.

Use of resources

In the system is installed the resource manager SLURM to manage the cluster resources. Complete documentation is avalailable at https://slurm.schedmd.com/.

SLURM is an open source software sytstem for cluster management; it is highly scalable and integrates fault-tolerance and job scheduling mechanisms.

SLURM basic concepts

The main components fo SLURM are:

Partitions can be thought as job queues each of which defines constraints on job size, time limits, resource usage permissions by users, etc.

SLURM allows a centralized management through a daemon, slurmctld, to monitor resources and jobs. Each node is managed by a daemon, slurmctld, which takes care of handling requests for activity

Some tools available to the user are:

A complete list of available commands is in man (available also online at https://slurm.schedmd.com/man_index.html): man <cmd>

Examples of use of some basic commands

system and resoures information

sinfo - Know and verify resources status (existing partitions and relating nodes, …) and system general status:

Example: $ sinfo

Output:

  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  hpc*      up     infinite   32     idle  ibiscohpc-wn[01-32]

Output shows partitions information; in this example:

squeue - Know jons queue status:

Example: $ squeue

Output:

  JOBID PARTITION     NAME     USER         ST      TIME  NODES NODELIST(REASON)
  4815  hpc           sleep    cnr-isas     R       0:04 

Output shows, for each job:

scontrol - detailed information about job and resources

Example (detailed information about ibiscohpc-wn02 node)

$ scontrol show node ibiscohpc-wn02

Output:

  NodeName=ibiscohpc-wn02 Arch=x86_64 CoresPerSocket=24
     CPUAlloc=0 CPUTot=96 CPULoad=0.01
     AvailableFeatures=HyperThread
     ActiveFeatures=HyperThread
     Gres=gpu:tesla:4(S:0)
     NodeAddr=ibiscohpc-wn02 NodeHostName=ibiscohpc-wn02 Version=20.11.5
     OS=Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018
     RealMemory=1546503 AllocMem=0 FreeMem=1528903 Sockets=2 Boards=1
     State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
     Partitions=hpc
     BootTime=2022-02-01T16:24:43 SlurmdStartTime=2022-02-01T16:25:25
     CfgTRES=cpu=96,mem=1546503M,billing=96
     AllocTRES=
     CapWatts=n/a
     CurrentWatts=0 AveWatts=0
     ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
     Comment=(null)
job preparation and submission

srun - manage a parallel job execution on the cluster managed by Slurm. If necessary, srun allocates resources for job execution.

Some useful srun parameters are:

-c, –cpus-per-task=<ncpus>

-l, –label

-N, –nodes=<minnodes>[-maxnodes]

-n, –ntasks=<number>

Example, interactively access a node, from UI:

  $ salloc  srun --pty /bin/bash

Example, submit a batch job, from UI:

  $ echo -e '#!/bin/sh\nhostname' | sbatch

Example, submit an MPI interactive job with <N> tasks, from UI:

  $ srun -n <N> <EXEFILE>
  

Important command when using OpenMP

Add the following command in the script used to submit an OpenMP job:

  $ export OMP_NUM_THREADS = <nthreads>

Tips for using the Intel OneAPI suite (suite compilers, libraries, etc provided by Intel)

To use Intel's suite of compilers and libraries, you need to use (interactively or internally any script in which they are needed) the command

. /nfsexports/intel/oneapi/setvars.sh 

Tips for using Red Hat Developer Toolset

For details about “Red Hat Developer Toolset” see https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/7/html/user_guide/chap-red_hat_developer_toolset

Here we report some examples showing how one can call the various development enviroments:

* Create a bash subshell in whichc the tools are working (in these case gcc/g++/gfortran/… v.10):

 $ scl enable devtoolset-10 bash 

# Make the tools operational (in this case gcc/g++/gfortran/… v.10, in the current shell):

 $ source scl_source enable devtoolset-10 

Tip for using Singularity

The following script is an example of how to use Singularity

#!/bin/bash
singularity run library://godlovedc/funny/lolcow 

Tips for python users with the new conda environment

Base

To use python, it is necessary to start the conda environment using the following command,

source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
<commands execution> [Example: python example.py]
conda deactivate 

Tensorflow

The tensorflow sub-environment activated after starting the conda environment

source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
conda activate tensorflowgpu
<commands execution> [Example: python example.py]
conda deactivate
conda deactivate

Bio-Informatics

To use bioconda sub-environment, the following command has to be executed.

source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
conda activate bioconda
<commands execution> [Example: python example.py]
conda deactivate
conda deactivate

Pytorch

To use Pytorch sub-environment, the following commands have to be executed.

source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
conda activate pytorchenv
<commands execution> [Example: python example.py]
conda deactivate
conda deactivate

Packages list

To list the available packages in the given environment, run the command,

conda list

Other tips for python

Parallel computation in python

#!/bin/bash
#SBATCH --nodes=[nnodes]           #number of nodes
#SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node
#SBATCH --gres=gpu:[ngpu]        #number of GPUs per node

Example of parallel jobs submission

Suppose a given python code has to be executed for different values of a variable “rep”. It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,… The python code example.py can have a line:

 rep=REP 

The submission script sub.sh can be used to parallelize the process in following way:

#!/bin/bash
#SBATCH --nodes=[nnodes]            #number of nodes
#SBATCH --ntasks-per-node=[ntasks per node]  #number of cores per node
#SBATCH --gres=gpu:[ngpu]         #number of GPUs per node
NPROC=[nprocesses]                     #number of processing units to be accessed

tmpstring=tmp               #temporary files generated 

count=0                     #begin counting the temporary files
for rep in {1..10};         #The value of rep should run from 1 to 10
do
    tmpprogram=${tmpstring}_${rep}.py         #temporary file names for each of the values of rep
    sed -e "s/REP/$rep/g"   #replace the variable REP in the .py with rep specified in the sub.sh file.
    $program > $tmpprogram  #create the temporary files in parallel
    python $tmpprogram &    #run the temporary files
    (( count++ ))           #increase the count number
    [[ $(( count % NPROC )) -eq 0 ]] && wait  #wait for the parallel programs to finish.
done
rm ${tmpstring}*            #optionally remove the temporary files after the execution of all the temporary files

Tips for using gmsh (a mesh generator)

To use gmsh it is necessary to configure the execution environment (shell) in order to guarantee the availability of the necessary libraries by running the following command:

 $ scl enable devtoolset-10 bash 

Within the shell configured in this way, it is then possible to execute the gmsh command available in the directory

  /nfsexports/SOFTWARE/gmsh-4.10.1-source/install/bin

When the ad-hoc configured shell is no longer needed, you can terminate with the command

 $ exit  

Tips for using FOAM Ver 9.0

To use this version, available in the directory

/nfsexports/SOFTWARE/OpenFOAM-9.0/

you need to configure the environment as follows:

 $ source /nfsexports/SOFTWARE/OpenFOAM-9.0/etc/bashrc 
 $ source /nfsexports/SOFTWARE/intel/oneapi/compiler/latest/env/vars.sh 

Tips for using Matlab for the execution of parallel jobs on the IBiSCo HPC cluster

Basic tips
Configuration and execution

Attached you will find an example of a *Profile File* that can be used to configure the multi-node parallel machine for the execution of Matlab parallel jobs on the IBiSco HPC cluster.

Example of configuration file for Parallel excution

The file must be decompressed before use

To be accessed by a Matlab program, the user *must first import* that file by starting the *Cluster Profile Manager* on the Matlab desktop (On the *Home* tab, in the *Environment* area, select *Parallel* > *Create and Manage Clusters*).

Parallel Configuration Window

Figure 1: Parallel Configuration Window

In the *Create and Manage Clusters* select the *Import* option

 Import Configuration Window

Figure 2: Import Configuration Window

Once the profile was imported, it can be referenced by a Matlab parallel program using the profile name 'SlurmIBISCOHPC': i.e.

mypool=parpool('SlurmIBISCOHPC', ___, ___, ___, ___) 
...
delete(mypool); 

To modify the 'SlurmIBISCOHPC' profile the user can use

  1. the *Create and Manage Clusters* window
  2. the Matlab Profile commands such as saveProfile (https://it.mathworks.com/help/parallel-computing/saveprofile.html)

Example of running a parallel matlab script

This is an example of using parfor to parallelize the for loop (demonstrated at MathWorks). This example calculates the spectral radius of a matrix and converts a for-loop into a parfor-loop. Open a file named as test.m with the following code

mypool=parpool('SlurmIBISCOHPC', 5) % 5 is the number of workers
n = 100;
A = 200;
a = zeros(n);
parfor i = 1:n
    a(i) = max(abs(eig(rand(A))));
end
delete(mypool); 
quit

To run this code, the following command executed on the UI can be used:

/nfsexports/SOFTWARE/MATLAB/R2020b/bin/matlab -nodisplay -nosplash -nodesktop -r test

Start