HPC
HPC

| HPC and Data for Lattice QCD

hw-debug

HPC and Data for Lattice QCD

hw-debug

Problems and Trouble-Shooting for HW Debugging hs 3/2000 See also:
	HOWTO-rootboard
	HOWTO-power
	HOWTO-refresh
	HOWTO-equal-check
        HOWTO-single-error

Problem:

	PC gives no Video Signal (screen remains dark)
Origin:
	Bad insertion (contact) of PB (!) or PC on PCI-bus
Fix:
	Re-insert PB or PC

Problem:

	PB not seen on PCI after power-up
Origin:
	Wrong loading of Altera because of interplay with marginal
	ramp-up of power (see also HOWTO-power)
Fix:

	- Leave power running for some minutes to thermalize PSU,
	  then try again
	- Re-programm Altera(s)
	- Otherwise exchange board with marginal Altera 

Problem:

	On some boards exception LED remains "on" after power-up
Origin:
	Marginal ramp-up of PSU creating problems on 
	1) Altera
	2) Jn Chips
Fix:
	1) see problem "PB not seen on PCI"
	2) see HOWTO-power

Problem:

	LED on PSU remains red (and 5V is not up) after power-up
Origin:
	PSU goes into overload protection due to
	1) wrong or marginal setting of the PSU
	2) defect PSU
Fix:
	1) see HOWTO-power
	2) contact ASTEC

Problem:

	Jn PM MER (on pairs of Jn), disappear at lower clock 
	Jn Glb Addr MER, disappear at lower clock 
Origin:
	Instable GTL termination voltage
	Marginal noise at lower limit seems to cause Jn PM MER
	Marginal noise at upper limit seems to cause GlbAddr MER
Fix:
	Stabilize and increase GTL reference voltage
	- Cooling Fingers on npn-transistors
	- R115 and R116, 100 Ohm instead of 140 Ohm
	- C168 and C169, 1 nF instead of 100 pF
Remark:
	Increasing reference voltage (exchange of resistors)
	WITHOUT exchange of capacitors seems to render upper limit
	marginal and may cause Glb Addr MER

Problem:

	Jn PM MER when reading correct jex in SYS-Mode
Origin:
	Correct jex files must have flipped microcode bits 70/71
	to compensate for swapped lines on Jn Piggy-Back.
	         Tz           PB             Jn
	   JPM __||______ ____________ __ __
	       __|_______ ____._______ __X__
	                      |
	                     CC

	==> correct RD arrives at CC
	==> correct MC+EDAC arrives at Jn
	==> incorrect EDAC arrives at Tz when reading JnPM in SYS-mode
Fix:
	Mask Tz exception when reading JnPM by software

Problem:

	TzBus SER on Comm in SYS-Mode
Origin:
	Comm increments SER count for each command
Fix:
	Mask exception by OS

Problem:


	(Jn Data) Memory EDAC Exceptions during/after (long) IO
Origin:
	Refresh died or was inoperative for some period
	(can be seen with scope monitoring APEmode Bit 3
	at Test Point #xxx)
	Some interplay with GlobalNotLocal bit on PB [Fabio]
	Bug in Altera (CC and/or PCI interfae, see HOWTO-Altera)
Fix:
	Correct version of PLD (>= 18.5.2000)
	Correct access sequence in OS [Fabio, Davide]

Problem:

	Data wrong or lost during IO
Origin:
	Bug with DataValid in PCI core 2.12 of PLDA
Fix:
	Use PLD version with work-around for above problem
	(see HOWTO-Altera)

Problem:

	Global Conditions (if ANY/ALL) are overrun
Origin:
	1) Wrong timing of Microcode
	2) Wrong clock skew between PB's and Root Board
Fix

	1) Correct Microcode
	2) Correct length or time delay of clock distribution cables

Problem:

	Fake BANK_PARITY Exceptions
Note: Also incorrect programs, which perform APE100-like
      remote JTOMEM access, may generate BankParity exceptions
Origin:
	0) Jn "Feature" (minimal latencies of exception
	   state machine are larger than that of memory 
	   controller)			[Fabio,Hubert]

	1) PB "Feature" (probably delicate interplay with PLD/CC)
	   Exception on ALL Jn		[A.Menchikov]
	2) Incorrect/Inconsistent value of GTL termination voltages
	   Exception on SOME Jn only	[G.Tecchiolli]
Fix:
	Case 0) must disappear when distances of memory access
	are increased to respect minimal values (see distances.html)
	Case 1) has never been seen when running WITH the Root Board

	Case 2) has been seen in cases where GTL Termination Voltages
	where incorrect and inconsistent (2.2 V vs. 1.5 V on SOME Jn,
	in this case the BPE and other exceptions, like APEmode Parity,
	were seen on the Jn with CORRECT Voltage!)