HPC
HPC

| HPC and Data for Lattice QCD

Architecture

HPC and Data for Lattice QCD

Architecture

An APE machine can be viewed as a 3-D processing grid with periodic boundaries, composed of Processing Nodes. Each Processing Node is directly connected to its 6 nearest-neighbours through synchronous data communication channels. The Processing Nodes are optimised for complex floating point arithmetics with support for integer and normal floating point operations. The APE100 and APEmille architecturs follow the SIMD paradigm (groups of nodes which execute the same instruction on different data) that is complemented on APEmille with a local addressing feature. Therefore all Processing Nodes may access their own local memories using different local addresses. This new feature is the most important extension to the APE100 architecture, opening a path for coding of algorithms which could not be efficiently implemented on APE100 machines.

The local addressing capability is a valuable addition to the power, already present in the APE100 architecture, of the local conditional statements

    Where(local condition)... Endwhere

and of global conditions derived by the set of local conditions, e.g.

    If(All(local condition)) ... Endif

A further enhancement to the APE100 architecture is represented by more general data routing capabilities among non-first-neighbour nodes.

While the computational kernel of APE100 is built using replicas of three kinds of boards (Controller Boards, Processing Boards and Communication Boards) an APEmille computational engine is based on multiple instances of just one Processing Board (PB). This design solution implies scalability and engineering advantages in comparison with the APE100 arrangement.

Each PB integrates all system functionality: flow control, data processing, internode communication and host<->APEmille I/O. A Root Board provides global synchronisation of the Processing Boards.

The host consists of one or more networked workstations, each controlling a group of PBs.

An APEmille machine can be partitioned into smaller SIMD machines, each comprising one or more Processing Boards, executing different instructions on each partition.

Each host in the network, using a high performance communication channel, maps the memories of a portion of APEmille (up to 32 nodes) on its own bus. The close integration with a network of host workstations allows a high input/output bandwidth with disks and peripherals in the range of 100 MByte per second per I/O device. Close integration of APEmille with standard workstation also adds the flexibility needed to customise the I/O system to the requirements of specific applications.