This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
wiki:user_guide [2022/03/28 16:10] cnr-guest [User guide] |
wiki:user_guide [2022/05/28 18:18] (current) cnr-guest [Job preparation ans submission] |
||
---|---|---|---|
Line 20: | Line 20: | ||
== The Storage Node Architecture == | == The Storage Node Architecture == | ||
- | The cluster has 4 Dell R740 storage nodes, each offering 16 16TB SAS HHDs and 8 1.9TB SATA SSDs. Each node is equipped with two InfiniBand EDR ports, one of which is connected to the Mellanox InfiniBand switch dedicated to storage, which guarantees a 100 Gb/s connection to all the compute nodes. \\ A separate 3 PB storage system | + | The cluster has 4 Dell R740 storage nodes, each offering 16 16TB SAS HHDs and 8 1.9TB SATA SSDs. Each node is equipped with two InfiniBand EDR ports, one of which is connected to the Mellanox InfiniBand switch dedicated to storage, which guarantees a 100 Gb/s connection to all the compute nodes. \\ While the aforementioned nodes are dedicated to the home and scratch areas of users, a separate 3 PB storage system |
== The Compute Node Architecture == | == The Compute Node Architecture == | ||
Line 73: | Line 73: | ||
In the implemented Lustre architecture (see Figure 3), both Management Service and Metadata Service (MDS) are configured on a storage node with Metadata Targets (MDT) stored on a 4-disk RAID-10 SSD array. The other 3 storage nodes host the OSTs for the two file systems exposed to Lustre, one for the home directory of the users and one for the scratch area of the jobs. In particular, the home filesystem is characterized by large needs for disk space and fault tolerance, so it is composed of 6 MDTs stored on an array of 3-disk SAS RAID-5 HDDs of 30 TB each. On the other hand, the scratch area is characterized by fast disk access times without the need for redundancy. Therefore, it is composed of 6 MDTs stored on 1.8 TB SSD disks. | In the implemented Lustre architecture (see Figure 3), both Management Service and Metadata Service (MDS) are configured on a storage node with Metadata Targets (MDT) stored on a 4-disk RAID-10 SSD array. The other 3 storage nodes host the OSTs for the two file systems exposed to Lustre, one for the home directory of the users and one for the scratch area of the jobs. In particular, the home filesystem is characterized by large needs for disk space and fault tolerance, so it is composed of 6 MDTs stored on an array of 3-disk SAS RAID-5 HDDs of 30 TB each. On the other hand, the scratch area is characterized by fast disk access times without the need for redundancy. Therefore, it is composed of 6 MDTs stored on 1.8 TB SSD disks. | ||
- | {{: wiki: lustrearch.png? | + | {{: |
// Figure 3: The implementation of the Lustre architecture // | // Figure 3: The implementation of the Lustre architecture // | ||
+ | |||
+ | |||
---- | ---- | ||
==== Obtaining login credentials ==== | ==== Obtaining login credentials ==== | ||
- | ==== How to log in ==== | + | Currently a potential user must ask for an account |
- | The resource is accessed | + | |
- | In particular, if the connection is made from a linux-like system it is sufficient | + | |
- | '' | + | ATTENTION: the TEMPORARY password must be changed at the first access |
- | If you intend to connect | + | To change the password |
- | available | + | |
- | '' | + | ==== Access procedure ==== |
+ | To acccess the system (in particular its front-end or UI - User Interface) an user needs to connect via SSH protocol to the host ibiscohpc-ui.scope.unina.it. Access is currently only in non-graphical terminal emulation mode. However the account is valid for all cluster resources. | ||
- | ==== Preparation and submission of jobs ==== | + | Access example from unix-like systems: |
- | * to interactively access a node from the UI, run | + | '' |
- | salloc | + | To access Ibisco from Windows systems a simple software is putty, freely available at '' |
- | * to submit a batch job from the UI, run | + | ==== Job preparation ans submission ==== |
+ | In the system is installed the resource manager SLURM to manage | ||
+ | Complete documentation is avalailable at '' | ||
- | echo -e '# | + | SLURM is an open source software sytstem for cluster management; it is highly scalable and integrates fault-tolerance and job scheduling mechanisms. |
+ | |||
+ | ==== SLURM basic concepts ==== | ||
+ | |||
+ | The main components fo SLURM are: | ||
+ | |||
+ | * // | ||
+ | * // | ||
+ | * // | ||
+ | * //**job steps**// - set of (tipically parallel) ativities inside a job. | ||
+ | |||
+ | Partitions can be thought as //**job queues**// each of which defines constraints on job size, time limits, resource usage permissions by users, etc. | ||
+ | |||
+ | SLURM allows a centralized management through a daemon, // | ||
+ | |||
+ | Some tools available to the user are: | ||
+ | |||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * … | ||
+ | |||
+ | A complete list of available commands is in man (available also online at '' | ||
+ | |||
+ | ==== Examples of use of some basic commands ==== | ||
+ | |||
+ | == system and resoures information == | ||
+ | |||
+ | '' | ||
+ | |||
+ | Example: '' | ||
+ | |||
+ | Output: | ||
+ | PARTITION AVAIL TIMELIMIT | ||
+ | hpc* up | ||
+ | |||
+ | Output shows partitions information; | ||
+ | |||
+ | * there is a partititon named " | ||
+ | * the partition is available (status: up); | ||
+ | * the partition can be used without time limits; | ||
+ | * the partition consists of 32 nodes; | ||
+ | * its status is idle; | ||
+ | * the available nodes are named '' | ||
+ | |||
+ | '' | ||
+ | |||
+ | Example: '' | ||
+ | |||
+ | Output: | ||
+ | JOBID PARTITION | ||
+ | 4815 hpc | ||
+ | |||
+ | Output shows, for each job: | ||
+ | * job identifier; | ||
+ | * name of partition on which the job was launched; | ||
+ | * job name; | ||
+ | * name of the user who launched the job; | ||
+ | * job status (running); | ||
+ | * job execution time. | ||
+ | |||
+ | '' | ||
+ | |||
+ | Example (detailed information about '' | ||
+ | |||
+ | '' | ||
+ | |||
+ | Output: | ||
+ | NodeName=ibiscohpc-wn02 Arch=x86_64 CoresPerSocket=24 | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | == job preparation and submission == | ||
+ | |||
+ | '' | ||
+ | If necessary, srun allocates resources for job execution. | ||
+ | |||
+ | Some useful srun parameters are: | ||
+ | |||
+ | '' | ||
+ | |||
+ | * number of CPUs allocated per process. By default, one CPU is used per process. | ||
+ | |||
+ | '' | ||
+ | |||
+ | * It shows at the top of the lines, on stdout, the number of the task to which the output refers. | ||
+ | |||
+ | '' | ||
+ | |||
+ | * minimum number ('' | ||
+ | * If the parameter is not specified, the commmand allocates the nodes neeeded to satisfy the requirements specified by the parameters '' | ||
+ | * If the values are outside the allowed range for the associated partition, the job is placed in a '' | ||
+ | |||
+ | '' | ||
+ | |||
+ | * number of task to run. '' | ||
+ | |||
+ | Example, interactively access a node, from UI: | ||
+ | $ salloc | ||
+ | |||
+ | Example, submit a batch job, from UI: | ||
+ | $ echo -e '# | ||
+ | |||
+ | Example, submit an MPI interactive job with <N> tasks, from UI: | ||
+ | $ srun -n <N> < | ||
+ | |||
- | * to submit an interactive MPI job with <N> tasks, from the UI, run | ||
- | srun -n <N> < | ||
- | * to check the status of the resources from the UI, run | ||
- | | ||
- | sinfo | ||
- | | ||
- | * To check the status of jobs from the UI, run | ||
- | squeue | ||
- | | ||
==== Available file systems ==== | ==== Available file systems ==== | ||
Line 135: | Line 251: | ||
< | < | ||
+ | |||
+ | == NVIDIA HPC SDK (compiler suites, libraries, etc provided by NVIDIA) == | ||
+ | |||
+ | * //**Ver 20.10**// - is available in the directory '' | ||
+ | |||
+ | == OpenMPI == | ||
+ | * //**Ver 4.1.orc5**// | ||
+ | |||
+ | |||
+ | == Julia == | ||
+ | |||
+ | * //**Ver 1.6.1**// - interpreter available in the directory '' | ||
+ | |||
+ | == FFTW libraries== | ||
+ | |||
+ | * //**ver 3.3.10**// - compiled with Intel compilers available in the directory '' | ||
+ | |||
+ | == Anaconda 3 environment == | ||
+ | |||
+ | * available in the directory '' | ||
+ | |||
+ | === complete sw packages for specific applications === | ||
+ | |||
+ | == Matlab == | ||
+ | |||
+ | * //**Ver R2020b**// - available in the directory '' | ||
+ | |||
+ | == Quantum ESPRESSO == | ||
+ | |||
+ | * //**Ver 7.0**// - available in the directory '' | ||
+ | |||
+ | == OpenFOAM == | ||
+ | * //**Ver 7.0**// - available in the directory '' | ||
+ | |||
+ | == Rheotool == | ||
+ | * //**Ver 5.0**// - available in the same directory of OpenFOAM | ||
+ | ==== For python users ==== | ||
+ | === Base === | ||
+ | |||
+ | To use python, it is necessary to start the conda environment using the following command, | ||
+ | |||
+ | < | ||
+ | < | ||
+ | conda deactivate </ | ||
+ | |||
+ | === Tensorflow === | ||
+ | |||
+ | The tensorflow sub-environment activated after starting the conda environment | ||
+ | |||
+ | < | ||
+ | conda activate tensorflowgpu | ||
+ | < | ||
+ | conda deactivate | ||
+ | conda deactivate</ | ||
+ | |||
+ | === Bio-Informatics === | ||
+ | |||
+ | To use bioconda sub-environment, | ||
+ | |||
+ | < | ||
+ | conda activate bioconda | ||
+ | < | ||
+ | conda deactivate | ||
+ | conda deactivate</ | ||
+ | |||
+ | === Packages list === | ||
+ | To list the available packages in the given environment, | ||
+ | |||
+ | < | ||
+ | |||
+ | === Parallel computation in python === | ||
+ | * The effective usage of the hpc can be done using parallelizing the processes. The codes can be parallelized by distributing the tasks among the available nodes and their respective CPUs and GPUs as well. This information can be specified in a simple submission bash script or a sub.sh file as follows, | ||
+ | |||
+ | < | ||
+ | #SBATCH --nodes=[nnodes] | ||
+ | #SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node | ||
+ | #SBATCH --gres=gpu: | ||
+ | |||
+ | === Example of parallel jobs submission === | ||
+ | Suppose a given python code has to be executed for different values of a variable " | ||
+ | < | ||
+ | |||
+ | The submission script sub.sh can be used to parallelize the process in following way: | ||
+ | |||
+ | < | ||
+ | #SBATCH --nodes=[nnodes] | ||
+ | #SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node | ||
+ | #SBATCH --gres=gpu: | ||
+ | NPROC=[nprocesses] | ||
+ | |||
+ | tmpstring=tmp | ||
+ | |||
+ | count=0 | ||
+ | for rep in {1..10}; | ||
+ | do | ||
+ | tmpprogram=${tmpstring}_${rep}.py | ||
+ | sed -e " | ||
+ | $program > $tmpprogram | ||
+ | python $tmpprogram & #run the temporary files | ||
+ | (( count++ )) # | ||
+ | [[ $(( count % NPROC )) -eq 0 ]] && wait #wait for the parallel programs to finish. | ||
+ | done | ||
+ | rm ${tmpstring}* | ||
+ | * Parallel job submissions can also be done by job array submission. More information about Job arrays can be found in [[https:// | ||
+ | |||
+ | * Parallelization can be implemented within the python code itself. For example, the evaluation of a function for different variable values can be done in parallel. Python offers many packages to parallelize the given process. The basic one among them is [[https:// | ||
+ | |||
+ | * The keras and Pytorch modules in tensorflow which are mainly used for machine learning detects the GPUs automatically. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||