User Tools

Site Tools


wiki:user_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wiki:user_guide [2022/05/06 14:17]
phegde
wiki:user_guide [2022/05/28 18:18] (current)
cnr-guest [Job preparation ans submission]
Line 101: Line 101:
 Complete documentation is avalailable at ''https://slurm.schedmd.com/''. Complete documentation is avalailable at ''https://slurm.schedmd.com/''.
  
-SLUR is an open source software sytstem for cluster management; it is highly scalable and  integrates fault-tolerance and job scheduling mechanisms. +SLURM is an open source software sytstem for cluster management; it is highly scalable and  integrates fault-tolerance and job scheduling mechanisms. 
  
 ==== SLURM basic concepts ==== ==== SLURM basic concepts ====
Line 325: Line 325:
  
 <code>#!/bin/bash <code>#!/bin/bash
-#SBATCH --nodes=           #number of nodes +#SBATCH --nodes=[nnodes]           #number of nodes 
-#SBATCH --ntasks-per-node= #number of cores per node +#SBATCH --ntasks-per-node=[ntasks per node] #number of cores per node 
-#SBATCH --gres=gpu:        #number of GPUs per node +#SBATCH --gres=gpu:[ngpu]        #number of GPUs per node</code>
-NPROC=                     #number of parallel codes to be run </code>+
  
 +=== Example of parallel jobs submission ===
 Suppose a given python code has to be executed for different values of a variable "rep". It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,... The python code example.py can have a line: Suppose a given python code has to be executed for different values of a variable "rep". It is possible to execute the python codes parallelly during the job submission process by creating temporary files each file with rep=a1, a2,... The python code example.py can have a line:
 <code> rep=REP </code> <code> rep=REP </code>
Line 336: Line 336:
  
 <code>#!/bin/bash <code>#!/bin/bash
-#SBATCH --nodes=            #number of nodes +#SBATCH --nodes=[nnodes]            #number of nodes 
-#SBATCH --ntasks-per-node=  #number of cores per node +#SBATCH --ntasks-per-node=[ntasks per node]  #number of cores per node 
-#SBATCH --gres=gpu:         #number of GPUs per node +#SBATCH --gres=gpu:[ngpu]         #number of GPUs per node 
-NPROC=                      #number of parallel codes to be run+NPROC=[nprocesses]                     #number of processing units to be accessed
  
 tmpstring=tmp               #temporary files generated  tmpstring=tmp               #temporary files generated 
Line 358: Line 358:
    * Parallelization can be implemented within the python code itself. For example, the evaluation of a function for different variable values can be done in parallel. Python offers many packages to parallelize the given process. The basic one among them is [[https://docs.python.org/3/library/multiprocessing.html|multiprocessing]].     * Parallelization can be implemented within the python code itself. For example, the evaluation of a function for different variable values can be done in parallel. Python offers many packages to parallelize the given process. The basic one among them is [[https://docs.python.org/3/library/multiprocessing.html|multiprocessing]]. 
  
-   * The keras module in tensorflow which is mainly used for machine learning detects the GPUs automatically. +   * The keras and Pytorch modules in tensorflow which are mainly used for machine learning detects the GPUs automatically. 
  
    
wiki/user_guide.1651846640.txt.gz · Last modified: 2022/05/06 14:17 by phegde