User Tools

Site Tools


wiki:user_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wiki:user_guide [2022/03/28 16:10]
cnr-guest [User guide]
wiki:user_guide [2022/05/28 18:18] (current)
cnr-guest [Job preparation ans submission]
Line 20: Line 20:
 == The Storage Node Architecture == == The Storage Node Architecture ==
  
-The cluster has 4 Dell R740 storage nodes, each offering 16 16TB SAS HHDs and 8 1.9TB SATA SSDs. Each node is equipped with two InfiniBand EDR ports, one of which is connected to the Mellanox InfiniBand switch dedicated to storage, which guarantees a 100 Gb/s connection to all the compute nodes. \\ separate 3 PB storage system is available via Ethernet and is used as a repository where users can move data to and from storage systems connected to the InfiniBand when they need large amounts of data over the time span of their job. As for the Ethernet network, each node has 2 ports at 25 Gb/s for connection to the Data Center core switch.+The cluster has 4 Dell R740 storage nodes, each offering 16 16TB SAS HHDs and 8 1.9TB SATA SSDs. Each node is equipped with two InfiniBand EDR ports, one of which is connected to the Mellanox InfiniBand switch dedicated to storage, which guarantees a 100 Gb/s connection to all the compute nodes. \\ While the aforementioned nodes are dedicated to the home and scratch areas of users, a separate 3 PB storage system will then be available. It will be accessible via Ethernet and can be used as a repository where users can move data to and from the storage systems connected to the InfiniBand when they need large amounts of data over the time span of their job. As for the Ethernet network, each node is equipped with 2 ports at 25 Gb/s for connection to the Data Center core switch.
  
 == The Compute Node Architecture == == The Compute Node Architecture ==
Line 73: Line 73:
 In the implemented Lustre architecture (see Figure 3), both Management Service and Metadata Service (MDS) are configured on a storage node with Metadata Targets (MDT) stored on a 4-disk RAID-10 SSD array. The other 3 storage nodes host the OSTs for the two file systems exposed to Lustre, one for the home directory of the users and one for the scratch area of ​​the jobs. In particular, the home filesystem is characterized by large needs for disk space and fault tolerance, so it is composed of 6 MDTs stored on an array of 3-disk SAS RAID-5 HDDs of 30 TB each. On the other hand, the scratch area is characterized by fast disk access times without the need for redundancy. Therefore, it is composed of 6 MDTs stored on 1.8 TB SSD disks. In the implemented Lustre architecture (see Figure 3), both Management Service and Metadata Service (MDS) are configured on a storage node with Metadata Targets (MDT) stored on a 4-disk RAID-10 SSD array. The other 3 storage nodes host the OSTs for the two file systems exposed to Lustre, one for the home directory of the users and one for the scratch area of ​​the jobs. In particular, the home filesystem is characterized by large needs for disk space and fault tolerance, so it is composed of 6 MDTs stored on an array of 3-disk SAS RAID-5 HDDs of 30 TB each. On the other hand, the scratch area is characterized by fast disk access times without the need for redundancy. Therefore, it is composed of 6 MDTs stored on 1.8 TB SSD disks.
  
-{{: wiki: lustrearch.png? 600 | The implementation of the Lustre architecture}}+{{:wiki:lustrearch.png?600|The implementation of the Lustre architecture}}
  
 // Figure 3: The implementation of the Lustre architecture // // Figure 3: The implementation of the Lustre architecture //
 +
 +
 ---- ----
  
 ==== Obtaining login credentials ==== ==== Obtaining login credentials ====
  
-==== How to log in ==== +Currently a potential user must ask for an account to the Ibisco reference colleague of her/his instituiongiving some identification data. The refrence colleague sends the data to the Ibisc administrators: they send back tha access data with a temporary password.
-The resource is accessed to the access point called User Interface (UI) using the SSH protocol.  +
-In particularif the connection is made from a linux-like system it is sufficient to run the command+
  
-''ssh ibiscohpc-ui.scope.unina.it -l <USERNAME>''+ATTENTION: the TEMPORARY password must be changed at the first access
  
-If you intend to connect from a Microsoft Windows system, it is necessary to install an appropriate software on your system. Among those available and with free access we recommend the software ''Putty''  +To change the password from command line use the "yppasswd" command. Yppasswd creates or changes password valid on every resource of the cluster (not only on the front-end server) (Network password in a Network Information Service  - NIS).
-available on the page+
  
-''https://www.putty.org/'' +==== Access procedure ==== 
 +To acccess the system (in particular its front-end or UI - User Interface) an user needs to connect via SSH protocol to the host ibiscohpc-ui.scope.unina.it. Access is currently only in non-graphical terminal emulation mode. However the account is valid for all cluster resources
  
-==== Preparation and submission of jobs ====+Access example from unix-like systems:
  
-* to interactively access a node from the UI, run+''$ ssh ibiscohpc-ui.scope.unina.it -l <USERNAME>''
  
-  salloc  srun --pty /bin/bash+To access Ibisco from Windows systems a simple software is putty, freely available at ''https://www.putty.org/''. From Windows 10 onwards it is also possible to use Openssh in a command window (CMD.exe o Powershell.exe). It is pre-installed (if it is not activated, it simply has to be activated in the Optional Features).
  
-to submit a batch job from the UI, run+==== Job preparation ans submission ==== 
 +In the system is installed the resource manager SLURM to manage the cluster resources. 
 +Complete documentation is avalailable at ''https://slurm.schedmd.com/''.
  
-  echo -e '#!/bin/sh\nhostname' | sbatch +SLURM is an open source software sytstem for cluster management; it is highly scalable and  integrates fault-tolerance and job scheduling mechanisms.  
 + 
 +==== SLURM basic concepts ==== 
 + 
 +The main components fo SLURM are: 
 + 
 +  * //**nodes**// - computing nodes; 
 +  * //**partitions**// - logic groups of nodes; 
 +  * //**jobs**// - allcocation of the resources assinged to an user for a given time interval; 
 +  * //**job steps**// - set of (tipically parallel) ativities inside a job. 
 + 
 +Partitions can be thought as //**job queues**// each of which defines constraints on job size, time limits, resource usage permissions by users, etc.  
 + 
 +SLURM allows a centralized management through a daemon, //**slurmctld**//, to monitor resources and jobs. Each node is managed by a daemon, //**slurmctld**//, which takes care of handling requests for activity  
 + 
 +Some tools available to the user are:  
 + 
 +  * [[https://slurm.schedmd.com/srun.html|srun]] - start a job; 
 +  * [[https://slurm.schedmd.com/sbatch.html|sbatch]] - submit batch scripts; 
 +  * [[https://slurm.schedmd.com/salloc.html|salloc]] -request the allocation of resources (nodes), with any constraints (eg, number of processors per node); 
 +  * [[https://slurm.schedmd.com/scancel.html|scancel]] - terminate queued or running jobs; 
 +  * [[https://slurm.schedmd.com/sinfo.html|sinfo]] - know information about the system status; 
 +  * [[https://slurm.schedmd.com/squeue.html|squeue]] - know jobs status; 
 +  * [[https://slurm.schedmd.com/sacct.html|sacct]] - get information about jobs. 
 +  * … 
 + 
 +A complete list of available commands is in man (available also online at ''https://slurm.schedmd.com/man_index.html''): ''man <cmd>'' 
 + 
 +==== Examples of use of some basic commands ==== 
 + 
 +== system and resoures information == 
 + 
 +''**sinfo**'' - Know and verify resources status (existing partitions and relating nodes, ...) and system general status: 
 + 
 +Example: ''$ sinfo'' 
 + 
 +Output: 
 +    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST 
 +    hpc*      up     infinite   32     idle  ibiscohpc-wn[01-32] 
 + 
 +Output shows partitions information; in this example: 
 + 
 +   * there is a partititon named "hpc" (* refers to default partition); 
 +   * the partition is available (status: up); 
 +   * the partition can be used without time limits; 
 +   * the partition consists of 32 nodes; 
 +   * its status is idle; 
 +   * the available nodes are named ''ibiscohpc-wn01'', ''ibiscohpc-wn02'', ..., ''ibiscohpc-wn32''
 + 
 +''**squeue**'' - Know jons queue status: 
 + 
 +Example: ''$ squeue'' 
 + 
 +Output: 
 +    JOBID PARTITION     NAME     USER         ST      TIME  NODES NODELIST(REASON) 
 +    4815  hpc           sleep    cnr-isas           0:04  
 + 
 +Output shows, for each job: 
 +   * job identifier; 
 +   * name of partition on which the job was launched; 
 +   * job name; 
 +   * name of the user who launched the job; 
 +   * job status (running); 
 +   * job execution time. 
 + 
 +''**scontrol**'' - detailed information about job and resources 
 + 
 +Example (detailed information about ''ibiscohpc-wn02'' node) 
 + 
 +''$ scontrol show node ibiscohpc-wn02'' 
 + 
 +Output: 
 +    NodeName=ibiscohpc-wn02 Arch=x86_64 CoresPerSocket=24 
 +       CPUAlloc=0 CPUTot=96 CPULoad=0.01 
 +       AvailableFeatures=HyperThread 
 +       ActiveFeatures=HyperThread 
 +       Gres=gpu:tesla:4(S:0) 
 +       NodeAddr=ibiscohpc-wn02 NodeHostName=ibiscohpc-wn02 Version=20.11.5 
 +       OS=Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018 
 +       RealMemory=1546503 AllocMem=0 FreeMem=1528903 Sockets=2 Boards=1 
 +       State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/
 +       Partitions=hpc 
 +       BootTime=2022-02-01T16:24:43 SlurmdStartTime=2022-02-01T16:25:25 
 +       CfgTRES=cpu=96,mem=1546503M,billing=96 
 +       AllocTRES= 
 +       CapWatts=n/
 +       CurrentWatts=0 AveWatts=0 
 +       ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/
 +       Comment=(null) 
 + 
 +== job preparation and submission == 
 + 
 +''**srun**'' - manage a parallel job execution on the cluster managed by Slurm. 
 +If necessary, srun allocates resources for job execution. 
 + 
 +Some useful srun parameters are: 
 + 
 +''-c'', ''--cpus-per-task=<ncpus>''  
 + 
 +  * number of CPUs allocated per process. By default, one CPU is used per process. 
 + 
 +''-l'', ''--label'' 
 + 
 +  * It shows at the top of the lines, on stdout, the number of the task to which the output refers. 
 + 
 +''-N'', ''--nodes=<minnodes>[-maxnodes]'' 
 + 
 +   * minimum number (''minnodes'') of nodes to allocate for the job and the possible maximum one. 
 +   * If the parameter is not specified, the commmand allocates the nodes neeeded to satisfy the requirements specified by the parameters ''-n'' e ''-c''
 +   * If the values are outside the allowed range for the associated partition, the job is placed in a ''PENDING'' state. This allows for possible execution at a later time, when the partition limit is possibly changed.  
 + 
 +''-n'', ''--ntasks=<number>'' 
 + 
 +  * number of task to run. ''srun'' allocates the necessary resources based on the number of required tasks (by default, one node is required for each task but, using the ''--cpus-per-task'' option, this behavior can be changed). 
 +  
 +Example, interactively access a node, from UI: 
 +    $ salloc  srun --pty /bin/bash 
 + 
 +Example, submit a batch job, from UI: 
 +    $ echo -e '#!/bin/sh\nhostname' | sbatch 
 + 
 +Example, submit an MPI interactive job with <N> tasks, from UI: 
 +    $ srun -n <N> <EXEFILE> 
 + 
  
-* to submit an interactive MPI job with <N> tasks, from the UI, run 
  
-  srun -n <N> <EXEFILE>  
  
-* to check the status of the resources from the UI, run 
-   
-  sinfo 
-   
-* To check the status of jobs from the UI, run 
  
-  squeue 
-   
 ==== Available file systems ==== ==== Available file systems ====
  
Line 135: Line 251:
  
 <code>. /nfsexports/intel/oneapi/setvars.sh </code> <code>. /nfsexports/intel/oneapi/setvars.sh </code>
 +
 +== NVIDIA HPC SDK (compiler suites, libraries, etc provided by NVIDIA) ==
 +
 +  * //**Ver 20.10**// - is available in the directory '' /nfsexports/SOFTWARE/nvidia/hpc_sdk/ ''
 +
 +== OpenMPI ==
 +  * //**Ver 4.1.orc5**// - configured to be CUDA-AWARE, is available in the directory ''/usr/mpi/gcc/openmpi-4.1.0rc5/ ''
 +
 +
 +== Julia ==
 +
 +  * //**Ver 1.6.1**// - interpreter available in the directory '' /nfsexports/SOFTWARE/julia-1.6.1 ''
 +
 +==  FFTW libraries==
 +
 +  * //**ver 3.3.10**// - compiled with Intel compilers available in the directory '' /nfsexports/SOFTWARE/fftw-3.3.10 ''
 +
 +== Anaconda 3 environment ==
 +
 +  * available in the directory '' /nfsexports/anaconda3 ''  
 +
 +=== complete sw packages for specific applications ===
 +
 +== Matlab ==
 +
 +  * //**Ver R2020b**// - available in the directory '' /nfsexports/SOFTWARE/MATLAB/R2020b/bin ''
 +
 +== Quantum ESPRESSO ==
 +
 +  * //**Ver 7.0**// - available in the directory '' /nfsexports/SOFTWARE/qe-7.0 ''
 +
 +== OpenFOAM ==
 +  * //**Ver 7.0**// - available in the directory '' /nfsexports/SOFTWARE/OpenFOAM-7.0/ ''
 +
 +== Rheotool ==
 +  * //**Ver 5.0**// - available in the same directory of  OpenFOAM
 +==== For python users ====
 +=== Base ===
 +
 +To use python, it is necessary to start the conda environment using the following command,
 +
 +<code>source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
 +<commands execution> [Example: python example.py]
 +conda deactivate </code>
 +
 +=== Tensorflow ===
 +
 +The tensorflow sub-environment activated after starting the conda environment
 +
 +<code>source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
 +conda activate tensorflowgpu
 +<commands execution> [Example: python example.py]
 +conda deactivate
 +conda deactivate</code>
 +
 +=== Bio-Informatics ===
 +
 +To use bioconda sub-environment, the following command has to be executed.
 +
 +<code>source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
 +conda activate bioconda
 +<commands execution> [Example: python example.py]
 +conda deactivate
 +conda deactivate</code>
 +
 +=== Packages list ===
 +To list the available packages in the given environment, run the command,
 +
 +<code>conda list</code> 
 +
 +=== Parallel computation in python ===
 +   * The effective usage of the hpc can be done using parallelizing the processes. The codes can be parallelized by distributing the tasks among the available nodes and their respective CPUs and GPUs as well. This information can be specified in a simple submission bash script or a sub.sh file as follows,
 +
 +<code>#!/bin/bash
 +#SBATCH --nodes=[nnodes]           #number of nodes
 +#SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node
 +#SBATCH --gres=gpu:[ngpu]        #number of GPUs per node</code>
 +
 +=== Example of parallel jobs submission ===
 +Suppose a given python code has to be executed for different values of a variable "rep". It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,... The python code example.py can have a line:
 +<code> rep=REP </code>
 +
 +The submission script sub.sh can be used to parallelize the process in following way:
 +
 +<code>#!/bin/bash
 +#SBATCH --nodes=[nnodes]            #number of nodes
 +#SBATCH --ntasks-per-node=[ntasks per node]  #number of cores per node
 +#SBATCH --gres=gpu:[ngpu]         #number of GPUs per node
 +NPROC=[nprocesses]                     #number of processing units to be accessed
 +
 +tmpstring=tmp               #temporary files generated 
 +
 +count=0                     #begin counting the temporary files
 +for rep in {1..10};         #The value of rep should run from 1 to 10
 +do
 +    tmpprogram=${tmpstring}_${rep}.py         #temporary file names for each of the values of rep
 +    sed -e "s/REP/$rep/g"   #replace the variable REP in the .py with rep specified in the sub.sh file.
 +    $program > $tmpprogram  #create the temporary files in parallel
 +    python $tmpprogram &    #run the temporary files
 +    (( count++ ))           #increase the count number
 +    [[ $(( count % NPROC )) -eq 0 ]] && wait  #wait for the parallel programs to finish.
 +done
 +rm ${tmpstring}*            #optionally remove the temporary files after the execution of all the temporary files</code>
 +   * Parallel job submissions can also be done by job array submission. More information about Job arrays can be found in [[https://slurm.schedmd.com/job_array.html]].
 +
 +   * Parallelization can be implemented within the python code itself. For example, the evaluation of a function for different variable values can be done in parallel. Python offers many packages to parallelize the given process. The basic one among them is [[https://docs.python.org/3/library/multiprocessing.html|multiprocessing]]. 
 +
 +   * The keras and Pytorch modules in tensorflow which are mainly used for machine learning detects the GPUs automatically. 
 +
 + 
 +
 +
 +
 +
 +
  
  
wiki/user_guide.1648483840.txt.gz · Last modified: 2022/03/28 16:10 by cnr-guest