HPC
HPC

| HPC and Data for Lattice QCD

Failure FAQ

HPC and Data for Lattice QCD

Failure FAQ

What has to be done when a PSU failure occurs?

When a PSU fails, please do the following:

  • Login to either qmasterJ or qmasterW
  • Make sure you can use "qc"
  • Execute "/srv/qroot/scripts/psuRevive.sh BP PSU"

The script retrieves the status of all PSUs an the affected BP, shuts down the PSU, sleeps 30 seconds, turns the PSU, sleeps again and retrieves the status again. The output is saved to a file named "psu-{j,w}-<BP>-<PSU>.<DATE>.log". Please save the file and forward it to an admin.