apemaster + blades
HPC and Data for Lattice QCD
apemaster + blades
The following private network exist:
- 192.168.0.* eth0 on unitPCs and blades, eth1 on APEmille master, doesn't exist on apeNEXT master
- 192.168.1.* eth1 on apeNEXT master and bladePCs, doesn't exist on
APEmille master. Should not be used since eth1 is used for bonding.
- 192,168.2.* eth2 on apeNEXT master and bladePCs, doesn't exist on APEmille master. Should not be used since eth2 is used for bonding.
- 192.168.3.* bond0 on apeNEXT master and bladePCs, doesn't exist on APEmille master.
apemaster
The installation/update is mostly handled by the feature apemaster (see /afs/.ifh.de/common/installation/test/feature/apemaster). This takes care of creating directories like /apeboot and /apeshare, install
SGE, nroot and a lot more.
BUT:
After the installation, the bonding interface on the new apemaster (for apeNEXT) has to be manually enabled:
(Replace the N with the number of the master)
Configuration files for the interfaces:
DEVICE=eth1 BOOTPROTO=static IPADDR=192.168.1.20N NETMASK=255.255.255.0 ONBOOT=yes TYPE=Ethernet MASTER=bond0 SLAVE=yes |
/etc/sysconfig/network-scripts/ifcfg-eth2
DEVICE=eth2 BOOTPROTO=static IPADDR=192.168.2.20N NETMASK=255.255.255.0 ONBOOT=yes TYPE=Ethernet MASTER=bond0 SLAVE=yes |
/etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0 BOOTPROTO=static IPADDR=192.168.3.20N NETMASK=255.255.255.0 ONBOOT=no TYPE=Ethernet MTU=4000 |
A module alias for automatic module loading has to be added, therefore add the following line to /etc/modules.conf :
alias bond0 bonding |
After that, set up the interface:
[apemaster5] ~ # ifup bond0
Enslaving eth1 to bond0
Enslaving eth2 to bond0
The new apemaster for apeNEXT have eth0 connected to the DESY network, eth1 and eth2 are bonded to one interface bond0 which is connected to internal switches of the blades. Due to the limited number of ports on these switches, not all blades are connected to an apemaster.
The most important "services" are (list not exhaustive)
- local SGE, installed in /usr/SGE/
- cronjob root: sync queuing system state, check temperature, change board queues over night (apemaster5)
- cronjob aperun: check unit alive status, ...
- provide boot files (PXE) and nfs root-filesystem for unit/blade PCs
- provide dhcp for unit/blade PCs
- provide local /zroot, /nroot.i586, /kroot for unit/blade PCs
- provide scratch space for user data (/data)
nroot.i586
To update the nroot that is exported to the blades, run (be aware that you might influence a running dnose)/nroot/sbin/amsync -n nroot.i586
NOTE:
Many configuration files (like dhcpd.conf, pxe files, apehosts.conf, hosts) are created automatically from /afs/ifh.de/group/ape/etc/SRC/hostsystem.xml. See hostsystem-config for details.
blade PCs
WARNING: Don't use 'Discard changes and exit' in the blade's BIOS, since it restores factory defaults (or at least forgets boot order).Booting
The blade PCs are either bootet via PXE and use an nfs root-filesystem mounted via the FastEthernet network (eth0, at present from apemaster4), or run from the flashdisk. In both cases, additional filesystems (/nroot, /apeshare; /apetmp via amd) are mounted from the apemaster which is responsible for handling the rack the blade is in.The standard SL3.0.5 boot process has been modified to run from a readonly root-filesystem. See especially /etc/rc.d/rc.ape (on apemaster: /apeboot/sl3.0.5_ro_vX.Y.Z/etc/rc.d/rc.ape).
The configuration files apehost.conf and hosts are downloaded from the apemaster at boot time. To update these, run
/usr/local/sbin/updateApehostNhosts
Information about the boot process is logged to
/var/log/apelog
The NFS root version is written in
/00NFSROOTINFO
(make sure you update this if you set up a new NFS root)
Kernel
Kernel source and configuration can be found at /afs/ifh.de/group/ape/anext/unitkernels.HIB
The HIB module is handled by /etc/rc.d/rc.local. It is downloaded from the apemaster which is responsible for handling the rack the blade is in, taking the module as specified in /etc/apehost.conf . Actions are logged to /var/log/hiblog. To load a different driver than the default one, type/etc/rc.d/rc.local restart modulename.o
where modulename.o has to exist in the hibDrivers/ directory (see below).
Before compiling the hib module, make sure the kernel source is cleaned and holds the correct kernel version:
setenv BLADEKERNREV linux-2.4.21-37.ELcustom
cd /afs/ifh.de/group/ape/anext/unitkernels/src/$BLADEKERNREV/
make distclean
cp /afs/ifh.de/group/ape/anext/unitkernels/kernel.config_2.4.21-37.ELsmp-6 .config
make oldconfig
make dep
The standard HIB module is included in the nose/dnose source (this one currently cannot set the clock generator). The module's Makefile needs to know where to find the kernel sources:
cd $NXTPROJECT/dnose/drv
make CINC=-I/afs/ifh.de/group/ape/anext/unitkernels/src/$BLADEKERNREV/include/
The version which is able to set the clock is in Davide's driver. Adjust KDIR in the Makefile and run make.
The HIB modules should be placed in /afs/ifh.de/group/ape/apeboot/hibDrivers/ 2.4.21-37.ELcustom (old modules are still stored in /afs/ifh.de/group/ape/anext/unitkernels/hib_drv/)
Flashdisk
The flashdisk is a copy of the NFS root. Startup scripts automatically encounter whether the system booted from NFS root or flashdisk and take the appropriate actions.When changes to the NFS root where made, these can be copied over to the flashdisk with the command
/usr/local/sbin/ape_syncToFlashdisk
This only works when booted from NFS root. Make sure that the root you sync from is up to date (it is taken from the apemaster serving the rack, not from apemaster4). This script also takes care to update the file
/01FLASHROOTINFO
which holds information about the last synchronisation.
To set up a fresh flashdisk, the following might be helpful (make sure you understand what you are doing):
parted -s /dev/hda rm 1
parted -s /dev/hda mkpart primary ext2 0.031 996.187
mke2fs -m 0 /dev/hda1
tune2fs -c 0 -i 0 /dev/hda1
/usr/local/sbin/ape_syncToFlashdisk
/sbin/grub --batch <<EOT
root (hd0,0)
setup (hd0)
quit
EOT
The tunefs command is very important, otherwise a filesystem check will prevent a proper startup some day. Be aware that the grub part takes a few minutes.