Differences

This shows you the differences between two versions of the page.

--- wiki:user_guide [2022/04/07 15:01]
cnr-guest [User guide]
+++ wiki:user_guide [2022/05/28 18:18] (current)
cnr-guest [Job preparation ans submission]
@@ Line 101: / Line 101: @@
 Complete documentation is avalailable at ''https://slurm.schedmd.com/''.
-SLUR is an open source software sytstem for cluster management; it is highly scalable and  integrates fault-tolerance and job scheduling mechanisms.
+SLURM is an open source software sytstem for cluster management; it is highly scalable and  integrates fault-tolerance and job scheduling mechanisms.
 ==== SLURM basic concepts ====
@@ Line 127: / Line 127: @@
   * …
-A complete list of available commands is in man (available also onlinbe at ''https://slurm.schedmd.com/man_index.html''): ''man <cmd>''
+A complete list of available commands is in man (available also online at ''https://slurm.schedmd.com/man_index.html''): ''man <cmd>''
 ==== Examples of use of some basic commands ====
@@ Line 251: / Line 251: @@
 <code>. /nfsexports/intel/oneapi/setvars.sh </code>
+== NVIDIA HPC SDK (compiler suites, libraries, etc provided by NVIDIA) ==
+  * //**Ver 20.10**// - is available in the directory '' /nfsexports/SOFTWARE/nvidia/hpc_sdk/ ''
+== OpenMPI ==
+  * //**Ver 4.1.orc5**// - configured to be CUDA-AWARE, is available in the directory ''/usr/mpi/gcc/openmpi-4.1.0rc5/ ''
+== Julia ==
+  * //**Ver 1.6.1**// - interpreter available in the directory '' /nfsexports/SOFTWARE/julia-1.6.1 ''
+==  FFTW libraries==
+  * //**ver 3.3.10**// - compiled with Intel compilers available in the directory '' /nfsexports/SOFTWARE/fftw-3.3.10 ''
+== Anaconda 3 environment ==
+  * available in the directory '' /nfsexports/anaconda3 ''
+=== complete sw packages for specific applications ===
+== Matlab ==
+  * //**Ver R2020b**// - available in the directory '' /nfsexports/SOFTWARE/MATLAB/R2020b/bin ''
+== Quantum ESPRESSO ==
+  * //**Ver 7.0**// - available in the directory '' /nfsexports/SOFTWARE/qe-7.0 ''
+== OpenFOAM ==
+  * //**Ver 7.0**// - available in the directory '' /nfsexports/SOFTWARE/OpenFOAM-7.0/ ''
+== Rheotool ==
+  * //**Ver 5.0**// - available in the same directory of  OpenFOAM
+==== For python users ====
+=== Base ===
+To use python, it is necessary to start the conda environment using the following command,
+<code>source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
+<commands execution> [Example: python example.py]
+conda deactivate </code>
+=== Tensorflow ===
+The tensorflow sub-environment activated after starting the conda environment
+<code>source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
+conda activate tensorflowgpu
+<commands execution> [Example: python example.py]
+conda deactivate
+conda deactivate</code>
+=== Bio-Informatics ===
+To use bioconda sub-environment, the following command has to be executed.
+<code>source /nfsexports/SOFTWARE/anaconda3.OK/setupconda.sh
+conda activate bioconda
+<commands execution> [Example: python example.py]
+conda deactivate
+conda deactivate</code>
+=== Packages list ===
+To list the available packages in the given environment, run the command,
+<code>conda list</code>
+=== Parallel computation in python ===
+   * The effective usage of the hpc can be done using parallelizing the processes. The codes can be parallelized by distributing the tasks among the available nodes and their respective CPUs and GPUs as well. This information can be specified in a simple submission bash script or a sub.sh file as follows,
+<code>#!/bin/bash
+#SBATCH --nodes=[nnodes]           #number of nodes
+#SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node
+#SBATCH --gres=gpu:[ngpu]        #number of GPUs per node</code>
+=== Example of parallel jobs submission ===
+Suppose a given python code has to be executed for different values of a variable "rep". It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,... The python code example.py can have a line:
+<code> rep=REP </code>
+The submission script sub.sh can be used to parallelize the process in following way:
+<code>#!/bin/bash
+#SBATCH --nodes=[nnodes]            #number of nodes
+#SBATCH --ntasks-per-node=[ntasks per node]  #number of cores per node
+#SBATCH --gres=gpu:[ngpu]         #number of GPUs per node
+NPROC=[nprocesses]                     #number of processing units to be accessed
+tmpstring=tmp               #temporary files generated
+count=0                     #begin counting the temporary files
+for rep in {1..10};         #The value of rep should run from 1 to 10
+do
+    tmpprogram=${tmpstring}_${rep}.py         #temporary file names for each of the values of rep
+    sed -e "s/REP/$rep/g"   #replace the variable REP in the .py with rep specified in the sub.sh file.
+    $program > $tmpprogram  #create the temporary files in parallel
+    python $tmpprogram &    #run the temporary files
+    (( count++ ))           #increase the count number
+    [[ $(( count % NPROC )) -eq 0 ]] && wait  #wait for the parallel programs to finish.
+done
+rm ${tmpstring}*            #optionally remove the temporary files after the execution of all the temporary files</code>
+   * Parallel job submissions can also be done by job array submission. More information about Job arrays can be found in [[https://slurm.schedmd.com/job_array.html]].
+   * Parallelization can be implemented within the python code itself. For example, the evaluation of a function for different variable values can be done in parallel. Python offers many packages to parallelize the given process. The basic one among them is [[https://docs.python.org/3/library/multiprocessing.html|multiprocessing]].
+   * The keras and Pytorch modules in tensorflow which are mainly used for machine learning detects the GPUs automatically.

ibisco HPC Wiki

User Tools

Site Tools

Differences

Page Tools