Run the program IIb: as batch

Next: Options for parallel runs Up: QMC-MPP Previous: Run the program

Run the program IIb: as batch

The second part of this chapter deals with starting production jobs in a batch queue. The command for sending a batch job usually is `qsub <Jobfile>'. In the shell the parallel job is then started by `mpirun -np <number PEs> <binary>'. There are some examples for the job cards with names as `muster.dsgsahga.*.job' in the directory `qmc/job'. Jobs should be indexed sequentially by `mn.kl' which replaces `9.yy' (see below) in the job cards with a maximum of 4 digits. An example for a 64 PE job with nonlocal pseudopotential is discussed by considering the file `qmc/job/muster.dsgsahga.64pe.nl.job'. In principle, the file should be named according to `DSGS.9.yy.job' to align with the notation characterizing job number and generation of chain jobs. The target directories for the log files of the QSUB job which follow the expressions `#QSUB -e' and `#QSUB -o' below are to be suitably replaced, however, because being user specific.
,,[snip]`` means that parts of minor importance have been omitted in the listing at that position. ,, # ...`` denotes an actual comment in the job card to be processed and ,, /* ...*/ `` refers to a comment for the user.

#QSUB -r qmc9.yy                                  /* job name for `qstat' */
#QSUB -e /home/supth012/qmc/TMP/out/dsgs.9.yy.err /* stderr file&job no. */
#QSUB -o /home/supth012/qmc/TMP/out/dsgs.9.yy.out /* stdout file&job no. */
#QSUB -l mpp_p=64                                 /* number PEs */
#QSUB -l mpp_t=28800                              /* max. job length [sec] */
#QSUB -l p_mpp_t=28000                            /* max. process length [sec] */
#QSUB -lm 15Mw                                    /* max. memory in MW */
#QSUB -s /bin/sh                                  /* shell */
#QSUB -me -mb                                     /* mail at start&end */
#!/bin/sh
# set -x

# Jobcard for Quanten-Monte-Carlo-MPP-MPI:
# thin film surface (dsgsahga): nonlocal pseudopotential

# sh-jobcard for 64 PE on one tmp$$ with/without apprentice-analysis
# chainjob-version

# Help: All areas between
###########################################################
#{ 

#}
###########################################################
# have generally to be adapted.

##########################################################
#{
job=9.yy		# Which job number?
pe=64			# How much PE's?
isys=0			# Which boundary conditions? bulk:isys=1,surface:isys=0
mv=020208		# Which system?
target=dsgsahga		# Which MAKE-target?
vpp=hrvkppnlgs.		# Which pseudopotential?	loc.: vpp=hrvkpplgs.
                                                 #	nonloc.: vpp=hrvkppnlgs.
surface=quad		# Which surface model? 	ideal: surface=quad 
                                           # 	Duke: surface=dsduke
app=			# Performance-analysis with Apprentice? yes:app=-eA,no:app=
bkette=0		# chainjob? $bkette!=0
lvl="1000"		# length of MC pre-run 
lhl="400"		# length of MC main run
thl="20"		# statistical divisor (lhl/thl has to be integer)
drv="3.1"		# double of maximal step size of pre-random walk in bohr
drh="3.1"		# double of maximal step size of main random walk in bohr
#}
###########################################################
[snip]
make VPP=$vpp IO=b SYS=$sys MV=$mv EXE=TMP/tmp$$ APP=$app JOB=$job $target
[snip]
###########################################################
#{
# $job
nvarindex=21						# number of parameters
vvarindex="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0"	# Is the par. varied?  0->1 
                                            /* ^ 1D-Scan */
# order: EZWK,SLAP(1:16),JASP(1:1),VCFP(1:3) !!!

# for-loops: Edit the number of different values per parameter.
# The order is analogous to the input file where it is significant!!!
# The total product of all na* has to be equal to $pe !!!

nagk=1
# $dat
# $isys
# $azf see below
naj=1
# $gadat
nags=1
nagp=1
nagsdb=1
nagpdb=1
# $asdat
naas=1
naap=1
naasdb=1
naapdb=1
nagh=1
naah=1
nahv=1
nak0=1
nadba1=1
nadba2=1
nathdb=1
naphdb=1
navpt=$pe                                         /* 1D-Scan */
naspt=1
nalpt=1
# $drv
# $lvl
# $drh
# $lhl
# $thl
# $dta
#}
###########################################################
[snip]
###########################################################
#{ 
# the actual values of the parameters, DB is dangling bond 
# They are chosen uniformly distributed per binary tparoutu, 
# which has i/o parameters:
# random-number-input number-of-output-values left-intervall \
# right-intervall output-format
agk=`~/qmc/bin/tparoutu $$ $nagk 10.689 10.689 3` 	# lattice constant
aj=`~/qmc/bin/tparoutu $$ $naj 1.0 1.0 2` 		# jastrow parameter
ags=`~/qmc/bin/tparoutu $$ $nags 0.98 0.98 2` 		# contraction param. s-Ga
agp=`~/qmc/bin/tparoutu $$ $nagp 0.89 0.89 2`		# contraction param. p-Ga
agsdb=`~/qmc/bin/tparoutu $$ $nagsdb 0.98 0.98 2`	# contraction param. s-Ga DB(irrel.)
agpdb=`~/qmc/bin/tparoutu $$ $nagpdb 0.89 0.89 2`	# contraction param. p-Ga DB(irrel.)
aas=`~/qmc/bin/tparoutu $$ $naas 0.98 0.98 2` 		# contraction param. s-As
aap=`~/qmc/bin/tparoutu $$ $naap 0.91 0.91 2` 		# contraction param. p-As
aasdb=`~/qmc/bin/tparoutu $$ $naasdb 0.98 0.98 2` 	# contraction param. s-As DB
aapdb=`~/qmc/bin/tparoutu $$ $naapdb 0.91 0.91 2` 	# contraction param. p-As DB
agh=`~/qmc/bin/tparoutu $$ $nagh 2.5 2.5 2` 		# s/p weight Ga
aah=`~/qmc/bin/tparoutu $$ $naah 3.0 3.0 2` 		# s/p weight As
ahv=`~/qmc/bin/tparoutu $$ $nahv 0.60 0.60 2` 		# As/Ga weight
ak0=`~/qmc/bin/tparoutu $$ $nak0 0.0 0.0 2` 		# fourier_k=0 - term (constant)
adba1=`~/qmc/bin/tparoutu $$ $nadba1 1.0 1.0 2` 	# DB As param. 1 (irrel.)
adba2=`~/qmc/bin/tparoutu $$ $nadba2 1.0 1.0 2` 	# DB As param. 2
athdb=`~/qmc/bin/tparoutu $$ $nathdb 35.264389682754654 35.264389682754654 15`
                                                        # DB As Theta(azimuth)
aphdb=`~/qmc/bin/tparoutu $$ $naphdb 270.0 270. 2` 	# DB As Phi(polar)(fixed)
avpt=`~/qmc/bin/tparoutu $$ $navpt 14.27 16.27 2` 	# confinement potential height
                                /* ^^^^^^^^^^^ 1D-Scan */
aspt=`~/qmc/bin/tparoutu $$ $naspt 0.139 0.139 3` 	# confinement potential stiffness
alpt=`~/qmc/bin/tparoutu $$ $nalpt 3.0 3.0 2` 		# confinement potential distance
#}
###########################################################
[snip]
mpirun -np $pe $target.${job}.exe
[snip]

For massive production runs while the system is kept fixed the parameter scans can be controlled automatically. Considering the confinement potential `avpt' for example, the user should supply the following information. The number of values the parameter should attain is inserted in `navpt', the array of indices `vvarindex' is filled with `1' only at that respective position referring to the chosen parameter, and the interval the parameter should scan is given by `left interval' and `right interval', here actually `14.27 16.27' which covers already a neighborhood of the optimal value. The job card distributes the various parameter sets into input files, each of which being associated with a definite PE via its PE number. The value of the job variable `pe' must coincide with the number given in the QSUB option by `mpp_p'.

The CPU time depends on the following quantities:

System size `MV':: The time increases of course with the number of nuclei and electrons, the input files actually present in this package contain a maximum value of 512 electrons.
Pseudopotential:: The job variable `vpp' can be set according to a local or nonlocal pseudopotential (PP). For each of both choices exist two job card examples `l' and `nl', because the optimal parameters of the wave function significantly differ. For testing the local PP is sufficient, while for highly accurate results the nonlocal version must be used which needs a significantly larger CPU time.
Length of Monte-Carlo runs:: CPU time depends linearly on the number of QMC sweeps, `lvl' denotes this number for thermalization and should be of the order of 1000, whereas the number of actual sampling sweeps is given by `lhl' and should be as high as possible to attain highest accuracy through the statistics. The time per sweep, i.e. the unit `1' at `lhl', could be measured through the run time difference between two runs with differing values of `lhl' other parameters kept fixed. It yields the actual speed of the program and the CPU.

Next: Options for parallel runs Up: QMC-MPP Previous: Run the programm

Robert Bahnsen
1/28/2002