wiki:uso_sist

Instructions and tips on using Ibisco software

Instructions and tips on using Ibisco software

Obtaining login credentials

Currently a potential user must ask for an account to the Ibisco reference colleague of her/his instituion, giving some identification data. The reference colleague sends the data to the Ibisc administrators: they send back tha access data with a temporary password.

ATTENTION: the TEMPORARY password must be changed at the first access

To change the password from command line use the “yppasswd” command. Yppasswd creates or changes a password valid on every resource of the cluster (not only on the front-end server) (Network password in a Network Information Service - NIS).

The login procedure will change lightly in a few months, see ahead Access Procedure.

Access procedure

To access the system (in particular its front-end or UI - User Interface) an user needs to connect via SSH protocol to the host ibiscohpc-ui.scope.unina.it. Access is currently only in non-graphical terminal emulation mode. However the account is valid for all cluster resources.

Currently the access is made via the SSH technique “user-password”, as shown below

Access example from unix-like systems:

$ ssh ibiscohpc-ui.scope.unina.it -l <USERNAME>

To access Ibisco from Windows systems a simple software is PuTTY, freely available at https://www.putty.org/. From Windows 10 onwards it is also possible to use Openssh in a command window (CMD.exe o Powershell.exe). It is pre-installed (if it is not activated, it simply has to be activated in the Optional Features).

In a few months the access to the cluster will be exclusively via the “user-SSH Key” method (other secure access methods are being studied).
The current users are invited to generate their key pairs and upload the public key on the server in their home.
The new users, when asking for an account, will follow a lightly different procedure: they will generate the keys pair but will not upload the public key to the server (they will not have yet access): they will send it to the Ibisco admin. The admin will copy it, with the right permissions, in the home of the new user. After that the user will have the ability to enter the system without digiting a server password (but still he/she will have to digit a passphrase, see ahead).
Once inside the user will create a server password with yppasswd valid for access all the nodes of the cluster.

Obviously, it is important to keep in a secret and safe place the private key and the passphrase, otherwise, as in all safety problem, all the advantages brought by safer access algorithms will vanish.

Here we show a possible way to generate the key pair in linux and in windows. Anyway there is a lot of documentation in internet about how to do.

on a linux system

from your home directory execute
$ ssh-keygen -t rsa
Press enter to first question (filename)
In response to the prompt “Enter passphrase”, enter a key passphrase to protect access to the private key. Using a passphrase enhances security, and a passphrase is recommended.
The key pair is generated by the system.

If you still have access with password to the system (old users), you can execute the following command that copy your public key to the server and append it to the file ./ssh/authorized_keys with the right permissions:

$ ssh-copy-id -i ~/.ssh/id_rsa.pub <username>@ibiscohpc-wiki.scope.unina.it

If you are a new user, simply send by mail the file ~/.ssh/id_rsa.pub to the Ibisco admins: they will provide for copying it in your .ssh directory on the cluster with the right permissions.

on a Windows system

We suggest PuTTY, a package for Windows that simplifies the use of Windows as SSH client and the management of the connections to remote hosts
To create the key pair (https://the.earth.li/~sgtatham/putty/0.77/htmldoc/) you can follow the following procedure.

Run PUTTYGEN.EXE
Leave the standard choiches (Key → SSH-2 RSA Key,Use probable primes, show fingerprint as SHA256; Parameters → RSA, 2048 bit)
Press the “Generate” button and follow the indications
when prompted for a passphrase, insert a good one and save it in a safe place
Save in some safe directory or external usb device the private key (remember the path, needed to run a session with PuTTY)
copy all the content of the box under “Public key for pasting …” (copy-paste) in a file id_rsa.pub. It will have the right format to be accepted by the OpenSSH (the SSH package available on linux and therefore also on IBiSco)
send by mail the public key to the admins: as written before, they will provide for copying it in your .ssh directory on the cluster with the right permissions.

Available file systems

Users of the resource currently have the ability to use the following file systems

/lustre/home/ file system shared between nodes and UI created using Lustre technology where users' homes reside

/lustre/scratch file system shared between nodes created using Lustre technology to be used as a scratch area

/home/scratch file system local to each node to be used as a scratch area

ATTENTION: /lustre/scratch and /home/scratch are ONLY accessible from the nodes (i.e. when one of them is accessed), not from the UI

In-depth documentation on Lustre is available online, at the link: https://www.lustre.org/

/ibiscostorage new scratch area shared among UI and computation nodes (available from 07/10/2022), not LUSTRE based

Job preparation ans submission

Premise: new job management rules active from 9/10/2022

To improve the use of resources, the job management rules have been changed.

* New usage policies based on fairshare mechanisms have been implemented
* New queues for job submissions have been defined

sequential queue:
- accepts only sequential jobs with a number of tasks not exceeding 1,
- who do not use GP-GPUs,
- for a total number of jobs running on it not exceeding 128
- and maximum execution time limit of 1 week
parallel queue:
- accepts only parallel jobs with task number greater than 1 and less than 1580,
- that use no more 64 GP-GPUs
- and maximum execution time limit of 1 week
gpus queue:
- only accepts jobs that use no more than 64 GP-GPUs,
- with task number less than 1580
- and maximum execution time limit of 1 week
hparallel queue:
- accepts only parallel jobs with task number greater than 1580 and less than 3160,
- that make use of at least 64 GP-GPUs
- and maximum execution time limit of 1 day

From 9 October the current queue will be disabled and only those defined here will be active, to be explicitly selected. For example, to subdue a job in the parallel queue, execute

$ srun -p parallel <MORE OPTIONS> <COMMAND NAME>

If the job does not comply with the rules of the queue used, it will be terminated.

Use of resources

In the system is installed the resource manager SLURM to manage the cluster resources. Complete documentation is avalailable at https://slurm.schedmd.com/.

SLURM is an open source software sytstem for cluster management; it is highly scalable and integrates fault-tolerance and job scheduling mechanisms.

SLURM basic concepts

The main components fo SLURM are:

nodes - computing nodes;
partitions - logic groups of nodes;
jobs - allcocation of the resources assinged to an user for a given time interval;
job steps - set of (tipically parallel) ativities inside a job.

Partitions can be thought as job queues each of which defines constraints on job size, time limits, resource usage permissions by users, etc.

SLURM allows a centralized management through a daemon, slurmctld, to monitor resources and jobs. Each node is managed by a daemon, slurmctld, which takes care of handling requests for activity

Some tools available to the user are:

srun - start a job;
sbatch - submit batch scripts;
salloc -request the allocation of resources (nodes), with any constraints (eg, number of processors per node);
scancel - terminate queued or running jobs;
sinfo - know information about the system status;
squeue - know jobs status;
sacct - get information about jobs.
…

A complete list of available commands is in man (available also online at https://slurm.schedmd.com/man_index.html): man <cmd>

Examples of use of some basic commands

system and resoures information

sinfo - Know and verify resources status (existing partitions and relating nodes, …) and system general status:

Example: $ sinfo

Output:

  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
  hpc*      up     infinite   32     idle  ibiscohpc-wn[01-32]

Output shows partitions information; in this example:

there is a partititon named “hpc” (* refers to default partition);
the partition is available (status: up);
the partition can be used without time limits;
the partition consists of 32 nodes;
its status is idle;
the available nodes are named ibiscohpc-wn01, ibiscohpc-wn02, …, ibiscohpc-wn32.

squeue - Know jons queue status:

Example: $ squeue

Output:

  JOBID PARTITION     NAME     USER         ST      TIME  NODES NODELIST(REASON)
  4815  hpc           sleep    cnr-isas     R       0:04

Output shows, for each job:

job identifier;
name of partition on which the job was launched;
job name;
name of the user who launched the job;
job status (running);
job execution time.

scontrol - detailed information about job and resources

Example (detailed information about ibiscohpc-wn02 node)

$ scontrol show node ibiscohpc-wn02

Output:

  NodeName=ibiscohpc-wn02 Arch=x86_64 CoresPerSocket=24
     CPUAlloc=0 CPUTot=96 CPULoad=0.01
     AvailableFeatures=HyperThread
     ActiveFeatures=HyperThread
     Gres=gpu:tesla:4(S:0)
     NodeAddr=ibiscohpc-wn02 NodeHostName=ibiscohpc-wn02 Version=20.11.5
     OS=Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018
     RealMemory=1546503 AllocMem=0 FreeMem=1528903 Sockets=2 Boards=1
     State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
     Partitions=hpc
     BootTime=2022-02-01T16:24:43 SlurmdStartTime=2022-02-01T16:25:25
     CfgTRES=cpu=96,mem=1546503M,billing=96
     AllocTRES=
     CapWatts=n/a
     CurrentWatts=0 AveWatts=0
     ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
     Comment=(null)

job preparation and submission

srun - manage a parallel job execution on the cluster managed by Slurm. If necessary, srun allocates resources for job execution.

Some useful srun parameters are:

-c, –cpus-per-task=<ncpus>

number of CPUs allocated per process. By default, one CPU is used per process.

-l, –label

It shows at the top of the lines, on stdout, the number of the task to which the output refers.

-N, –nodes=<minnodes>[-maxnodes]

minimum number (minnodes) of nodes to allocate for the job and the possible maximum one.
If the parameter is not specified, the commmand allocates the nodes neeeded to satisfy the requirements specified by the parameters -n e -c.
If the values are outside the allowed range for the associated partition, the job is placed in a PENDING state. This allows for possible execution at a later time, when the partition limit is possibly changed.

-n, –ntasks=<number>

number of task to run. srun allocates the necessary resources based on the number of required tasks (by default, one node is required for each task but, using the –cpus-per-task option, this behavior can be changed).

Example, interactively access a node, from UI:

  $ salloc  srun --pty /bin/bash

Example, submit a batch job, from UI:

  $ echo -e '#!/bin/sh\nhostname' | sbatch

Example, submit an MPI interactive job with <N> tasks, from UI:

  $ srun -n <N> <EXEFILE>

Important command when using OpenMP

Add the following command in the script used to submit an OpenMP job:

  $ export OMP_NUM_THREADS = <nthreads>

It specifies the number of threads to be used when running a code which uses OpenMP to execute parallel programming. The maximum value should be the number of processors available. For more information about OpenMP, please check openMP.

Tips for using the Intel OneAPI suite (suite compilers, libraries, etc provided by Intel)

To use Intel's suite of compilers and libraries, you need to use (interactively or internally any script in which they are needed) the command

. /nfsexports/intel/oneapi/setvars.sh

Tips for using Red Hat Developer Toolset

For details about “Red Hat Developer Toolset” see https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/7/html/user_guide/chap-red_hat_developer_toolset

Here we report some examples showing how one can call the various development enviroments:

* Create a bash subshell in whichc the tools are working (in these case gcc/g++/gfortran/… v.10):

 $ scl enable devtoolset-10 bash

# Make the tools operational (in this case gcc/g++/gfortran/… v.10, in the current shell):

 $ source scl_source enable devtoolset-10

Tip for using Singularity

The following script is an example of how to use Singularity

#!/bin/bash
singularity run library://godlovedc/funny/lolcow

Tips for python users with the new conda environment

Base

To use python, it is necessary to start the conda environment using the following command,

source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
<commands execution> [Example: python example.py]
conda deactivate

Tensorflow

The tensorflow sub-environment activated after starting the conda environment

source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
conda activate tensorflowgpu
<commands execution> [Example: python example.py]
conda deactivate
conda deactivate

Bio-Informatics

To use bioconda sub-environment, the following command has to be executed.

source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
conda activate bioconda
<commands execution> [Example: python example.py]
conda deactivate
conda deactivate

Pytorch

To use Pytorch sub-environment, the following commands have to be executed.

source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
conda activate pytorchenv
<commands execution> [Example: python example.py]
conda deactivate
conda deactivate

Packages list

To list the available packages in the given environment, run the command,

conda list

Other tips for python

Parallel computation in python

The effective usage of the hpc can be done using parallelizing the processes. The codes can be parallelized by distributing the tasks among the available nodes and their respective CPUs and GPUs as well. This information can be specified in a simple submission bash script or a sub.sh file as follows,

#!/bin/bash
#SBATCH --nodes=[nnodes]           #number of nodes
#SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node
#SBATCH --gres=gpu:[ngpu]        #number of GPUs per node

Example of parallel jobs submission

Suppose a given python code has to be executed for different values of a variable “rep”. It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,… The python code example.py can have a line:

 rep=REP

The submission script sub.sh can be used to parallelize the process in following way:

#!/bin/bash
#SBATCH --nodes=[nnodes]            #number of nodes
#SBATCH --ntasks-per-node=[ntasks per node]  #number of cores per node
#SBATCH --gres=gpu:[ngpu]         #number of GPUs per node
NPROC=[nprocesses]                     #number of processing units to be accessed

tmpstring=tmp               #temporary files generated 

count=0                     #begin counting the temporary files
for rep in {1..10};         #The value of rep should run from 1 to 10
do
    tmpprogram=${tmpstring}_${rep}.py         #temporary file names for each of the values of rep
    sed -e "s/REP/$rep/g"   #replace the variable REP in the .py with rep specified in the sub.sh file.
    $program > $tmpprogram  #create the temporary files in parallel
    python $tmpprogram &    #run the temporary files
    (( count++ ))           #increase the count number
    [[ $(( count % NPROC )) -eq 0 ]] && wait  #wait for the parallel programs to finish.
done
rm ${tmpstring}*            #optionally remove the temporary files after the execution of all the temporary files

Parallel job submissions can also be done by job array submission. More information about Job arrays can be found in https://slurm.schedmd.com/job_array.html.

Parallelization can be implemented within the python code itself. For example, the evaluation of a function for different variable values can be done in parallel. Python offers many packages to parallelize the given process. The basic one among them is multiprocessing.

The keras and Pytorch modules in tensorflow which are mainly used for machine learning detects the GPUs automatically.

Tips for using gmsh (a mesh generator)

To use gmsh it is necessary to configure the execution environment (shell) in order to guarantee the availability of the necessary libraries by running the following command:

 $ scl enable devtoolset-10 bash

Within the shell configured in this way, it is then possible to execute the gmsh command available in the directory

  /nfsexports/SOFTWARE/gmsh-4.10.1-source/install/bin

When the ad-hoc configured shell is no longer needed, you can terminate with the command

 $ exit

Tips for using FOAM Ver 9.0

To use this version, available in the directory

/nfsexports/SOFTWARE/OpenFOAM-9.0/

you need to configure the environment as follows:

 $ source /nfsexports/SOFTWARE/OpenFOAM-9.0/etc/bashrc

 $ source /nfsexports/SOFTWARE/intel/oneapi/compiler/latest/env/vars.sh

Tips for using Matlab for the execution of parallel jobs on the IBiSCo HPC cluster

Basic tips

To use matlab command window, please use ssh ibiscohpc-ui.scope.unina.it -l [username] -Y when logging into the IBISCO cluster.
Setup the matlab environment by using the command /nfsexports/SOFTWARE/MATLAB/R2020b/bin/matlab . This opens the mathworks command window where you will be able to add the settings file (see ahead).
Matlab version R2022a can be accessed using the command /nfsexports/SOFTWARE/MATLAB/R2022a/bin/matlab .

Configuration and execution

Attached you will find an example of a *Profile File* that can be used to configure the multi-node parallel machine for the execution of Matlab parallel jobs on the IBiSco HPC cluster.

Example of configuration file for Parallel excution

The file must be decompressed before use

To be accessed by a Matlab program, the user *must first import* that file by starting the *Cluster Profile Manager* on the Matlab desktop (On the *Home* tab, in the *Environment* area, select *Parallel* > *Create and Manage Clusters*).

Figure 1: Parallel Configuration Window

In the *Create and Manage Clusters* select the *Import* option

Figure 2: Import Configuration Window

Once the profile was imported, it can be referenced by a Matlab parallel program using the profile name 'SlurmIBISCOHPC': i.e.

mypool=parpool('SlurmIBISCOHPC', ___, ___, ___, ___) 
...
delete(mypool);

To modify the 'SlurmIBISCOHPC' profile the user can use

the *Create and Manage Clusters* window
the Matlab Profile commands such as saveProfile (https://it.mathworks.com/help/parallel-computing/saveprofile.html)

Example of running a parallel matlab script

This is an example of using parfor to parallelize the for loop (demonstrated at MathWorks). This example calculates the spectral radius of a matrix and converts a for-loop into a parfor-loop. Open a file named as test.m with the following code

mypool=parpool('SlurmIBISCOHPC', 5) % 5 is the number of workers
n = 100;
A = 200;
a = zeros(n);
parfor i = 1:n
    a(i) = max(abs(eig(rand(A))));
end
delete(mypool); 
quit

To run this code, the following command executed on the UI can be used:

/nfsexports/SOFTWARE/MATLAB/R2020b/bin/matlab -nodisplay -nosplash -nodesktop -r test

Start

Table of Contents

Instructions and tips on using Ibisco software

Obtaining login credentials

Access procedure

Available file systems

Job preparation ans submission

Premise: new job management rules active from 9/10/2022

Use of resources

SLURM basic concepts

Examples of use of some basic commands

system and resoures information

job preparation and submission

Tips for using the Intel OneAPI suite (suite compilers, libraries, etc provided by Intel)

Tips for using Red Hat Developer Toolset

Tip for using Singularity

Tips for python users with the new conda environment

Base

Tensorflow

Bio-Informatics

Pytorch

Packages list

Other tips for python

Parallel computation in python

Example of parallel jobs submission

Tips for using gmsh (a mesh generator)

Tips for using FOAM Ver 9.0

Tips for using Matlab for the execution of parallel jobs on the IBiSCo HPC cluster

Basic tips

Configuration and execution

Example of running a parallel matlab script