User Tools

Site Tools


wiki:user_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wiki:user_guide [2022/05/06 12:28]
phegde [For python users]
wiki:user_guide [2022/05/28 18:18] (current)
cnr-guest [Job preparation ans submission]
Line 101: Line 101:
 Complete documentation is avalailable at ''https://slurm.schedmd.com/''. Complete documentation is avalailable at ''https://slurm.schedmd.com/''.
  
-SLUR is an open source software sytstem for cluster management; it is highly scalable and  integrates fault-tolerance and job scheduling mechanisms. +SLURM is an open source software sytstem for cluster management; it is highly scalable and  integrates fault-tolerance and job scheduling mechanisms. 
  
 ==== SLURM basic concepts ==== ==== SLURM basic concepts ====
Line 322: Line 322:
  
 === Parallel computation in python === === Parallel computation in python ===
-The effective usage of the hpc can be done using parallelizing the processes. The codes can be parallelized by distributing the tasks among the available nodes and their respective CPUs and GPUs as well. This information can be specified in a simple submission bash script or a sub.sh file as follows,+   The effective usage of the hpc can be done using parallelizing the processes. The codes can be parallelized by distributing the tasks among the available nodes and their respective CPUs and GPUs as well. This information can be specified in a simple submission bash script or a sub.sh file as follows,
  
 <code>#!/bin/bash <code>#!/bin/bash
-#SBATCH --nodes=           #number of nodes +#SBATCH --nodes=[nnodes]           #number of nodes 
-#SBATCH --ntasks-per-node= #number of cores per node +#SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node 
-#SBATCH --gres=gpu:        #number of GPUs per node +#SBATCH --gres=gpu:[ngpu]        #number of GPUs per node</code>
-NPROC=                     #number of parallel codes to be run </code>+
  
 +=== Example of parallel jobs submission ===
 Suppose a given python code has to be executed for different values of a variable "rep". It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,... The python code example.py can have a line: Suppose a given python code has to be executed for different values of a variable "rep". It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,... The python code example.py can have a line:
 <code> rep=REP </code> <code> rep=REP </code>
Line 335: Line 335:
 The submission script sub.sh can be used to parallelize the process in following way: The submission script sub.sh can be used to parallelize the process in following way:
  
-<code>  +<code>#!/bin/bash 
-#!/bin/bash +#SBATCH --nodes=[nnodes]            #number of nodes 
-#SBATCH --nodes=            #number of nodes +#SBATCH --ntasks-per-node=[ntasks per node]  #number of cores per node 
-#SBATCH --ntasks-per-node=  #number of cores per node +#SBATCH --gres=gpu:[ngpu]         #number of GPUs per node 
-#SBATCH --gres=gpu:         #number of GPUs per node +NPROC=[nprocesses]                     #number of processing units to be accessed
-NPROC=                      #number of parallel codes to be run+
  
 tmpstring=tmp               #temporary files generated  tmpstring=tmp               #temporary files generated 
Line 354: Line 353:
     [[ $(( count % NPROC )) -eq 0 ]] && wait  #wait for the parallel programs to finish.     [[ $(( count % NPROC )) -eq 0 ]] && wait  #wait for the parallel programs to finish.
 done done
-rm ${tmpstring}*            #optionally remove the temporary files after the execution of all the temporary files.+rm ${tmpstring}*            #optionally remove the temporary files after the execution of all the temporary files</code> 
 +   * Parallel job submissions can also be done by job array submission. More information about Job arrays can be found in [[https://slurm.schedmd.com/job_array.html]].
  
-</code> +   Parallelization can be implemented within the python code itself. For example, the evaluation of a function for different variable values can be done in parallel. Python offers many packages to parallelize the given process. The basic one among them is [[https://docs.python.org/3/library/multiprocessing.html|multiprocessing]].  
-Parallelization can be implemented within the python code itself. For example, the evaluation of a function for different variable values can be done in parallel. Python offers many packages to parallelize the given process. The basic one among them is [[https://docs.python.org/3/library/multiprocessing.html|multiprocessing]]. + 
 +   * The keras and Pytorch modules in tensorflow which are mainly used for machine learning detects the GPUs automatically.  
 + 
 + 
  
  
wiki/user_guide.1651840134.txt.gz · Last modified: 2022/05/06 12:28 by phegde