HPC
HPC

| HPC and Data for Lattice QCD

FAQ

HPC and Data for Lattice QCD

FAQ

You get message "qpace-exec: command not found"?
Make sure /usr/local/bin is in your path
You get message "mpiexec: command not found"?
Make sure that /opt/parastation/bin is in your path
qpace-sub refuses to run my job.

Please check if you specified a valid queue for your job and your job meets the submission requirements. It might also be possible that your job cannot be placed in the "pro" because you are not eligible to run production jobs.

How can I pass environment variables to my job?

Passing environment variables to a job via Torque/Maui involves serveral steps:

  1. Add the variables you want to export to your job file via #PBS -v MYVAR=myvalue
  2. Later in the job script, pass this value from Torque to Parastation via export PSI_EXPORTS=MYVAR. If you need to export more than one variable, add the names of all these variables to PSI_EXPORTS, delimited by commas (like PSI_EXPORTS=VAR1,VAR2).

For an example, see the question about node numbers below.

In my job, how can I distinguish between nodes (apart from hostname)?

When using qpace-exec, Parastation automatically sets the environment variables PMI_RANK and PBS_SIZE which list the number of the node and the total number of nodes assigned to the job. Here is a short job example that demonstrates the use of job numbers and environment variables:

PBS job script myexp.job:

#!/bin/bash
#
# the job's name
#PBS -N myexp
#
# combine stderr and stdout
#PBS -jeo
#
# use the "dev" queue
#PBS -q dev
#
# export the variable MYVAR
#PBS -v MYVAR=myTestVar

# change to working dir
cd $PBS_O_WORKDIR

# export variables
export PSI_EXPORTS=MYVAR

# start actual job on assigned nodes
qpace-exec --sourceprintf myexp.sh

The job script myexp.sh:

#!/bin/bash

cd $PBS_O_WORKDIR
echo "Test job on $(hostname)"
echo "I am node $PMI_RANK of $PMI_SIZE nodes."
echo "Current dir: $(pwd)"
echo "Content of MYVAR: $MYVAR"

If the job was started via e.g.

qpace-sub --topo=1x1x2 myexp.job

the output would be similar to:

[pro] Thu Dec  9 17:13:12 CET 2010 Starting prologue as root on host nc-25-29
[pro] Current Ramdisk: usr.sites.v08.rc01.20101130
[pro] Uptime:  17:13:12 up 6 days,  5:12,  0 users,  load average: 0.00, 0.00, 0.00
[pro] Callout to master...
[qmasterW pro] Thu Dec  9 17:13:12 CET 2010 Starting prologue as pbs
[qmasterW pro] About to run: /opt/qroot/bin/setupTnw --nc0=25:28 --nc1=25:29
[qmasterW pro] About to run: /opt/qroot/bin/setupGs --nc0=25:28 --nc1=25:29
[qmasterW pro] About to run: /opt/qroot/bin/setupNodes --nc0=25:28 --nc1=25:29
[qmasterW pro] Thu Dec  9 17:13:23 CET 2010 Done.
[pro] Thu Dec  9 17:13:23 CET 2010 Done...
[0]: Test job on nc-25-29
[0]: I am node 0 of 2 nodes.
[1]: Test job on nc-25-28
[1]: I am node 1 of 2 nodes.
[0]: Current dir: /home/huesken
[0]: Content of MYVAR: myTestVar
[1]: Current dir: /home/huesken
[1]: Content of MYVAR: myTestVar
[epi] Thu Dec  9 17:13:23 CET 2010 Starting epilogue as root on host nc-25-29
[epi] Callout to master...
[qmasterW epi] Thu Dec  9 17:13:24 CET 2010 Starting epilogue as pbs
[qmasterW epi] Thu Dec  9 17:13:24 CET 2010 Done...
[epi] Thu Dec  9 17:13:24 CET 2010 Done...