HPC
HPC

| HPC and Data for Lattice QCD

apemaster + blades

HPC and Data for Lattice QCD

apemaster + blades

This page gives a quick overview of how the apemaster, unitPC's (APEmille) and blades (apeNEXT) work.

The following private network exist:
  • 192.168.0.* eth0 on unitPCs and blades, eth1 on APEmille master, doesn't exist on apeNEXT master
  • 192.168.1.* eth1 on apeNEXT master and bladePCs, doesn't exist on APEmille master. Should not be used since eth1 is used for bonding.
  • 192,168.2.* eth2 on apeNEXT master and bladePCs, doesn't exist on APEmille master. Should not be used since eth2 is used for bonding.
  • 192.168.3.* bond0 on apeNEXT master and bladePCs, doesn't exist on APEmille master.
UnitPCs IP addresses end with two digits, according to their hostname host?? . BladePCs IP addresses end with three digits (the first is a '1'), according to their hostname host??? . Apemaster IP addresses end with 200+N, where N is the number of the apemaster (apemaster{N}).

apemaster

The installation/update is mostly handled by the feature apemaster (see /afs/.ifh.de/common/installation/test/feature/apemaster). This takes care of creating directories like /apeboot and /apeshare, install SGE, nroot and a lot more.

BUT: After the installation, the bonding interface on the new apemaster (for apeNEXT) has to be manually enabled:
(Replace the N with the number of the master)

Configuration files for the interfaces:

/etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
IPADDR=192.168.1.20N
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
MASTER=bond0
SLAVE=yes

/etc/sysconfig/network-scripts/ifcfg-eth2
DEVICE=eth2
BOOTPROTO=static
IPADDR=192.168.2.20N
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
MASTER=bond0
SLAVE=yes

/etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
BOOTPROTO=static
IPADDR=192.168.3.20N
NETMASK=255.255.255.0
ONBOOT=no
TYPE=Ethernet
MTU=4000

A module alias for automatic module loading has to be added, therefore add the following line to /etc/modules.conf :
alias bond0 bonding

After that, set up the interface:
[apemaster5] ~ # ifup bond0
Enslaving eth1 to bond0
Enslaving eth2 to bond0

The new apemaster for apeNEXT have eth0 connected to the DESY network, eth1 and eth2 are bonded to one interface bond0 which is connected to internal switches of the blades. Due to the limited number of ports on these switches, not all blades are connected to an apemaster.

The most important "services" are (list not exhaustive)

  • local SGE, installed in /usr/SGE/
  • cronjob root: sync queuing system state, check temperature, change board queues over night (apemaster5)
  • cronjob aperun: check unit alive status, ...
  • provide boot files (PXE) and nfs root-filesystem for unit/blade PCs
  • provide dhcp for unit/blade PCs
  • provide local /zroot, /nroot.i586, /kroot for unit/blade PCs
  • provide scratch space for user data (/data)

nroot.i586

To update the nroot that is exported to the blades, run (be aware that you might influence a running dnose)
/nroot/sbin/amsync -n nroot.i586

NOTE:
Many configuration files (like dhcpd.conf, pxe files, apehosts.conf, hosts) are created automatically from /afs/ifh.de/group/ape/etc/SRC/hostsystem.xml. See hostsystem-config for details.

blade PCs

WARNING: Don't use 'Discard changes and exit' in the blade's BIOS, since it restores factory defaults (or at least forgets boot order).

Booting

The blade PCs are either bootet via PXE and use an nfs root-filesystem mounted via the FastEthernet network (eth0, at present from apemaster4), or run from the flashdisk. In both cases, additional filesystems (/nroot, /apeshare; /apetmp via amd) are mounted from the apemaster which is responsible for handling the rack the blade is in.

The standard SL3.0.5 boot process has been modified to run from a readonly root-filesystem. See especially /etc/rc.d/rc.ape (on apemaster: /apeboot/sl3.0.5_ro_vX.Y.Z/etc/rc.d/rc.ape).

The configuration files apehost.conf and hosts are downloaded from the apemaster at boot time. To update these, run
/usr/local/sbin/updateApehostNhosts

Information about the boot process is logged to
/var/log/apelog

The NFS root version is written in
/00NFSROOTINFO
(make sure you update this if you set up a new NFS root)

Kernel

Kernel source and configuration can be found at /afs/ifh.de/group/ape/anext/unitkernels.

HIB

The HIB module is handled by /etc/rc.d/rc.local. It is downloaded from the apemaster which is responsible for handling the rack the blade is in, taking the module as specified in /etc/apehost.conf . Actions are logged to /var/log/hiblog. To load a different driver than the default one, type
/etc/rc.d/rc.local restart modulename.o
where modulename.o has to exist in the hibDrivers/ directory (see below).

Before compiling the hib module, make sure the kernel source is cleaned and holds the correct kernel version:
setenv BLADEKERNREV linux-2.4.21-37.ELcustom
cd /afs/ifh.de/group/ape/anext/unitkernels/src/$BLADEKERNREV/
make distclean
cp /afs/ifh.de/group/ape/anext/unitkernels/kernel.config_2.4.21-37.ELsmp-6 .config
make oldconfig
make dep


The standard HIB module is included in the nose/dnose source (this one currently cannot set the clock generator). The module's Makefile needs to know where to find the kernel sources:
cd $NXTPROJECT/dnose/drv
make CINC=-I/afs/ifh.de/group/ape/anext/unitkernels/src/$BLADEKERNREV/include/


The version which is able to set the clock is in Davide's driver. Adjust KDIR in the Makefile and run make.

The HIB modules should be placed in /afs/ifh.de/group/ape/apeboot/hibDrivers/ 2.4.21-37.ELcustom (old modules are still stored in /afs/ifh.de/group/ape/anext/unitkernels/hib_drv/)

Flashdisk

The flashdisk is a copy of the NFS root. Startup scripts automatically encounter whether the system booted from NFS root or flashdisk and take the appropriate actions.

When changes to the NFS root where made, these can be copied over to the flashdisk with the command
/usr/local/sbin/ape_syncToFlashdisk
This only works when booted from NFS root. Make sure that the root you sync from is up to date (it is taken from the apemaster serving the rack, not from apemaster4). This script also takes care to update the file
/01FLASHROOTINFO
which holds information about the last synchronisation.

To set up a fresh flashdisk, the following might be helpful (make sure you understand what you are doing):
parted -s /dev/hda rm 1
parted -s /dev/hda mkpart primary ext2 0.031 996.187
mke2fs -m 0 /dev/hda1
tune2fs -c 0 -i 0 /dev/hda1
/usr/local/sbin/ape_syncToFlashdisk
/sbin/grub --batch <<EOT
root (hd0,0)
setup (hd0)
quit
EOT


The tunefs command is very important, otherwise a filesystem check will prevent a proper startup some day. Be aware that the grub part takes a few minutes.