HPC
HPC

| HPC and Data for Lattice QCD

CAOS

HPC and Data for Lattice QCD

CAOS

HOWTO CAOS

In the following we suppose you have dowloaded all APEmille software under your KROOT directory.

Installation

To install CAOS you have to perform the following steps:
  1. driver compilation: go under the directory KROOT/lib/caosdrv and run make.
    WARNING: the machine where you compile the drivers must run the same kernel version of you unit PCs.

  2. extra perl module compilation: go under the directory KROOT/lib/caolib and run make
After these steps your software is installed.

Configuration

On the master machine set the following environment variables:
  1. create the unix devices for the PB-BOARD on each unit PC: mknod /dev/ampb0 c 120 0
    mknod /dev/ampb1 c 120 1
    mknod /dev/ampb2 c 120 2
    mknod /dev/ampb3 c 120 3
  2. create the unix device on each unit PC where a ROOT board is installed: mknod /dev/amrb0 c 121 0
  3. PCSLAVENAMES as the name list of your unit pc.
    Example:
    • setenv SLAVENAMES pcam0:pcam1:pcam2:pcam3 (if you use tcsh)
    • export SLAVENAMES=pcam0:pcam1:pcam2:pcam3 (if you use bash)

  4. ROOTNAMES as the list of PC names where you have a ROOT-BOARD.
    Example
    • setenv ROOTNAMES pcam0:pcam4 (if you use tcsh)
    • export ROOTNAMES=pcam0:pcam4 (if you use bash)
After this step your software is ready to run.

Running CAOS

To run CAOS:
  1. Insert the ampbdrv.o driver on each unit PC where a PB-BOARD is installed. Login as root on each unit PC and type: insmod KROOT/lib/caosdrv/ampbdrv.o
  2. Insert the amrbdrv.o driver on each unit PC where a RB-BOARD is installed. Login as root and type: insmod KROOT/lib/caosdrv/amrbdrv.o
  3. Start the slave process on each unit PC. KROOT/bin/slave.pl [slave_name] where slave_name can optionally be the name of the slave.

  4. Start the root process on each unit PC where a ROOT-BOARD is installed KROOT/bin/root.pl [root_name] where root_name can optionally be the name of the root.

  5. From the master machine run KROOT/bin/caos -C configuration filename.jex where configuration can be one of the following:
    1. board bid to run a program on a single board. bid specifies the board number.
    2. unit uid to run a program on a unit (4 boards). uid specifies the unit number.
    3. crate cid to run a program on a crate (16 boards). cid specifies the crate number.

CAOS options

CAOS accept moreover the following options:

-j mask : set jmille mask register (REG[0])
-t mask : set tmille mask register (REG[1])
-f value : set memory refresh value register (REG[0x200])
-F : set PLD fast modality
-r value : set master root mask register (REG[9])
-n x y z : set node [x,y,z] as default node
-p filename : load a script filename
-i : interactive mode
-H : exec an hard reset at start
-R : exec an hard reset before exit
-o string : open the machine along X Y Z dimension, "string" may be one of: x, y, z, xy, xz, or xyz
-s : skip program loading except system variables and data
-T : tower|crate|unit|pb tid cid aid pid
-V : show release version
-v : verbose flag
-h : show this help

CAOS interactive

CAOS commands can be executed either interactively as caos -C configuration -i or using command files as caos -C configuration -p commandfile. The commands can be specified using the so called TACO like syntax.

TACO like syntax

CAOS support also the TACO like syntax to write and read each device.
  • Write access to devices: w device nodes addr num : data
  • Read access to devices: r device nodes addr num
where device ca be:
  • ar Altera Register
  • tr Tz Register
  • td Tz Data Memory
  • tp Tz Program Memory
  • jr Jn Register
  • jd Jn Data Memory
  • jp Jn Program Memory

and nodes are specified as

  • all all nodes
  • n node specified by node_abs_id
  • [x,y,z] node specified by triple of node_abs_x, node_abs_y, node_abs_z
  • [x1,y1,z1][x2,y2,z2] slice specified by two triples of node_abs_x, node_abs_y, node_abs_z
  • def default node
addr is a decimal or 0x-prefixed hex value. For Jn Data Memory it refers to 32-bit words while for Program Memories it refers to 96-bit words. num is a decimal or 0x-prefixed hex value and always refers to the number of 32-bit words (in particular for Program Memories and Jn Data Memory). data can be a space- or newline-separated list of decimal or 0x-prefixed hex values. For Program Memories, the LSW comes before the MSW (in the spirit of a 32-bit word sequence).

CAOS Daemon

If you would like to use the CAOS daemon resources manager:
  1. start the caosd daemon
  2. start any caos session from a shell where the USECAOSD variable is setted.
caosd trace wich resources are in use and avoid that a user can access the resources that are in use by another user. The cgi kw.cgi, that you can install in your cgi-bin directory, outputs an html page showing which machine are used by wich user.

Example

Let suppose to run the program pippo.jex with a mask jmille value of 0x88 on the unit 1: caos -H -C unit 0 -j 0x88 pippo.jex the -H option exec an harware reset before to load and run your program.

If you would like to read the status register of each jmille after running a program:

caos -C unit 1 -i
APE_master --> r jr all 6 1
APE_master --> quit