hw-debug
HPC and Data for Lattice QCD
hw-debug
Problems and Trouble-Shooting for HW Debugging hs 3/2000
See also:
Origin:
Fix:
Origin:
Fix:
Origin:
Fix:
Origin:
Fix:
Origin:
Fix:
Remark:
Origin:
Fix:
Origin:
Fix:
Origin:
Fix:
Origin:
Fix:
Origin:
Fix
Note: Also incorrect programs, which perform APE100-like
Origin:
Fix:
HOWTO-rootboard HOWTO-power HOWTO-refresh HOWTO-equal-check HOWTO-single-error
Problem:
PC gives no Video Signal (screen remains dark)
Bad insertion (contact) of PB (!) or PC on PCI-bus
Re-insert PB or PC
Problem:
PB not seen on PCI after power-up
Wrong loading of Altera because of interplay with marginal ramp-up of power (see also HOWTO-power)
- Leave power running for some minutes to thermalize PSU, then try again - Re-programm Altera(s) - Otherwise exchange board with marginal Altera
Problem:
On some boards exception LED remains "on" after power-up
Marginal ramp-up of PSU creating problems on 1) Altera 2) Jn Chips
1) see problem "PB not seen on PCI" 2) see HOWTO-power
Problem:
LED on PSU remains red (and 5V is not up) after power-up
PSU goes into overload protection due to 1) wrong or marginal setting of the PSU 2) defect PSU
1) see HOWTO-power 2) contact ASTEC
Problem:
Jn PM MER (on pairs of Jn), disappear at lower clock Jn Glb Addr MER, disappear at lower clock
Instable GTL termination voltage Marginal noise at lower limit seems to cause Jn PM MER Marginal noise at upper limit seems to cause GlbAddr MER
Stabilize and increase GTL reference voltage - Cooling Fingers on npn-transistors - R115 and R116, 100 Ohm instead of 140 Ohm - C168 and C169, 1 nF instead of 100 pF
Increasing reference voltage (exchange of resistors) WITHOUT exchange of capacitors seems to render upper limit marginal and may cause Glb Addr MER
Problem:
Jn PM MER when reading correct jex in SYS-Mode
Correct jex files must have flipped microcode bits 70/71 to compensate for swapped lines on Jn Piggy-Back.
Tz PB Jn JPM __||______ ____________ __ __ __|_______ ____._______ __X__ | CC
==> correct RD arrives at CC ==> correct MC+EDAC arrives at Jn ==> incorrect EDAC arrives at Tz when reading JnPM in SYS-mode
Mask Tz exception when reading JnPM by software
Problem:
TzBus SER on Comm in SYS-Mode
Comm increments SER count for each command
Mask exception by OS
Problem:
(Jn Data) Memory EDAC Exceptions during/after (long) IO
Refresh died or was inoperative for some period (can be seen with scope monitoring APEmode Bit 3 at Test Point #xxx)
Some interplay with GlobalNotLocal bit on PB [Fabio]
Bug in Altera (CC and/or PCI interfae, see HOWTO-Altera)
Correct version of PLD (>= 18.5.2000) Correct access sequence in OS [Fabio, Davide]
Problem:
Data wrong or lost during IO
Bug with DataValid in PCI core 2.12 of PLDA
Use PLD version with work-around for above problem (see HOWTO-Altera)
Problem:
Global Conditions (if ANY/ALL) are overrun
1) Wrong timing of Microcode 2) Wrong clock skew between PB's and Root Board
1) Correct Microcode 2) Correct length or time delay of clock distribution cables
Problem:
Fake BANK_PARITY Exceptions
remote JTOMEM access, may generate BankParity exceptions
0) Jn "Feature" (minimal latencies of exception state machine are larger than that of memory controller) [Fabio,Hubert]
1) PB "Feature" (probably delicate interplay with PLD/CC) Exception on ALL Jn [A.Menchikov]
2) Incorrect/Inconsistent value of GTL termination voltages Exception on SOME Jn only [G.Tecchiolli]
Case 0) must disappear when distances of memory access are increased to respect minimal values (see distances.html)
Case 1) has never been seen when running WITH the Root Board
Case 2) has been seen in cases where GTL Termination Voltages where incorrect and inconsistent (2.2 V vs. 1.5 V on SOME Jn, in this case the BPE and other exceptions, like APEmode Parity, were seen on the Jn with CORRECT Voltage!)