FAQ
HPC and Data for Lattice QCD
FAQ
You get message "qpace-exec: command not found"?
You get message "mpiexec: command not found"?
qpace-sub refuses to run my job.
Please check if you specified a valid queue for your job and your job meets the submission requirements. It might also be possible that your job cannot be placed in the "pro" because you are not eligible to run production jobs.
How can I pass environment variables to my job?
Passing environment variables to a job via Torque/Maui involves serveral steps:
- Add the variables you want to export to your job file via
#PBS -v MYVAR=myvalue
- Later in the job script, pass this value from Torque to
Parastation via
export PSI_EXPORTS=MYVAR
. If you need to export more than one variable, add the names of all these variables toPSI_EXPORTS
, delimited by commas (likePSI_EXPORTS=VAR1,VAR2
).
For an example, see the question about node numbers below.
In my job, how can I distinguish between nodes (apart from hostname)?
When using qpace-exec, Parastation automatically sets the
environment variables PMI_RANK
and PBS_SIZE
which list the number of
the node and the total number of nodes assigned to the job. Here is
a short job example that demonstrates the use of job numbers and
environment variables:
PBS job script myexp.job
:
#!/bin/bash
#
# the job's name
#PBS -N myexp
#
# combine stderr and stdout
#PBS -jeo
#
# use the "dev" queue
#PBS -q dev
#
# export the variable MYVAR
#PBS -v MYVAR=myTestVar
# change to working dir
cd $PBS_O_WORKDIR
# export variables
export PSI_EXPORTS=MYVAR
# start actual job on assigned nodes
qpace-exec --sourceprintf myexp.sh
The job script myexp.sh
:
#!/bin/bash
cd $PBS_O_WORKDIR
echo "Test job on $(hostname)"
echo "I am node $PMI_RANK of $PMI_SIZE nodes."
echo "Current dir: $(pwd)"
echo "Content of MYVAR: $MYVAR"
If the job was started via e.g.
qpace-sub --topo=1x1x2 myexp.job
the output would be similar to:
[pro] Thu Dec 9 17:13:12 CET 2010 Starting prologue as root on host nc-25-29
[pro] Current Ramdisk: usr.sites.v08.rc01.20101130
[pro] Uptime: 17:13:12 up 6 days, 5:12, 0 users, load average: 0.00, 0.00, 0.00
[pro] Callout to master...
[qmasterW pro] Thu Dec 9 17:13:12 CET 2010 Starting prologue as pbs
[qmasterW pro] About to run: /opt/qroot/bin/setupTnw --nc0=25:28 --nc1=25:29
[qmasterW pro] About to run: /opt/qroot/bin/setupGs --nc0=25:28 --nc1=25:29
[qmasterW pro] About to run: /opt/qroot/bin/setupNodes --nc0=25:28 --nc1=25:29
[qmasterW pro] Thu Dec 9 17:13:23 CET 2010 Done.
[pro] Thu Dec 9 17:13:23 CET 2010 Done...
[0]: Test job on nc-25-29
[0]: I am node 0 of 2 nodes.
[1]: Test job on nc-25-28
[1]: I am node 1 of 2 nodes.
[0]: Current dir: /home/huesken
[0]: Content of MYVAR: myTestVar
[1]: Current dir: /home/huesken
[1]: Content of MYVAR: myTestVar
[epi] Thu Dec 9 17:13:23 CET 2010 Starting epilogue as root on host nc-25-29
[epi] Callout to master...
[qmasterW epi] Thu Dec 9 17:13:24 CET 2010 Starting epilogue as pbs
[qmasterW epi] Thu Dec 9 17:13:24 CET 2010 Done...
[epi] Thu Dec 9 17:13:24 CET 2010 Done...