HPC
HPC

| HPC and Data for Lattice QCD

Boot Process

HPC and Data for Lattice QCD

Boot Process

apemaster

For a more complete documentation, see also here.

The new apemaster (for apeNEXT) don't set up automatically the bonding interface (for technical reasons), therefore this has to be started manually:
[apemaster5] ~ # ifup bond0
Enslaving eth1 to bond0
Enslaving eth2 to bond0

Start the queuing system (make sure that the parallel environments where disabled before stopping SGE, otherwise jobs might start immediately)
[apemaster5] ~ # /usr/SGE/sge start

blades

The power button for the case holding the blades is in the back. Open the racks backdoor and push the '1' button (push very short, otherwise it is interpreted as "turn off"), located on the right of the case.
Make sure that the first blade-case that is started is the one which is connected to the APEmille switch (currently in rack 2), otherwise booting might not work. When booting from nfs, apemaster4 has to be available (currently connected to rack2).
The blades are started with the bottom button (when a blade is running, the LEDs will emit light. Often the power button has to be pushed multiple times until the LEDs stay on).

Currently, the dnose slave has to be started manually. There should be one screen session on apemaster5, ssh from there to the host. Make sure the screen window carries in it's name both the logical and physical hostname. Then do:
exec bash
source /nroot/nlogin.sh
DNOSE=/nroot/opt/dnose-1.7 /nroot/opt/dnoses-1.7/bin/dnoses -cfg /nroot/etc/dnose.conf

apemille

To power up the apemille racks, first switch on the main fuse (large switch), then afterwards with some delay the small fuses switches from left to right. After that push the button on the Slow Control. Monitor that the LED on the PSU don't turn red. If they do, manually restart them with the small fuse buttons (again turn them on from left to right with a short delay, i.e. first the upper crate controlling the clock, then the lower crate).