HPC
HPC

| HPC and Data for Lattice QCD

single-error

HPC and Data for Lattice QCD

single-error

# ----------------------------------------------------------
# Setting single error counters to (maxcount -1) for testing
# ----------------------------------------------------------
Tz:
    single error counters:
    ----------------------
                           0x02  [0x0eeeeeee] value set to (maxcount -1)
    call krun with option:

                        -c,--set-tz-serc=VAL   
                        sets Tarzan single error counter before running the
                        program; VAL is a dec/hex(0xX) 32 bits value
                        --set-tz-serc=0x0eeeeeee
Jn:
    single error counters:
    ----------------------
                           0x01  [0xeeeeeeee] value set to (maxcount -1)
                           0x02  [0xfefefefe] value set to (maxcount -1)
                           0x03  [0x......fe] value set to (maxcount -1)

actions: (unit)

   (1) prior to execution run script (on the unit)

                                    /zroot/tools/set-serr-unit
   (2) start krun WITHOUT hard reset (-H) 

files:

  /zroot/tools/jsercnt.tac
# ----------------------------------------------------------
# SETTING and READING SERR-COUNTERS WITH CAOS
# ----------------------------------------------------------
Take following CODINE-script as example.
#-----------------------------------------------------------
#!/usr/bin/zsh
#$ -q unit0_q
#$ -V -cwd
#$ -S /usr/bin/zsh
#$ -j y
#$ -N ubik-0
UNIT=00
APECAOS=APErun
PRG=u1-v3-unit-xtc-psk.jex
echo -n ".start : "
date
echo
# Step 1 - just initialize machine

 $APECAOS  -caos -unit $UNIT -- /zroot/tools/jinit0.jex
# Step 2 - set single error counters to (max-1)
 $APECAOS  -caos -unit $UNIT -- -p /zroot/tools/caos-serr.tac
# Step 3 - run your application *WITHOUT* hard reset
 $APECAOS  -caos -unit $UNIT -- -o z -j 0x88  $PRG >! out.u0.cmp.${JOB_ID}
echo
# Step 4 - read single error counters (serc) after application has finished
# serc's are read board-wise, so you have to cycle through all the
# boards involved (here it is a unit)
#
# script "serr-grep" expects "-coords ${CID}${UID}${BID}"
# in this example, the status is collected in a file "Serr.${JOB_ID}"
#
touch Serr.${JOB_ID}
for BOARD in 0 1 2 3
    do
     /zroot/tools/serr-grep -coords ${UNIT}${BOARD} >> Serr.${JOB_ID}
done
echo -n ".end : "
date
#-----------------------------------------------------------

to do:


        (1) implement /zroot/tools/serr-grep into .epilog-files of
            CODINE
        (2) create global "single error count database"