Failure FAQ
HPC and Data for Lattice QCD
Failure FAQ
What has to be done when a PSU failure occurs?
When a PSU fails, please do the following:
- Login to either qmasterJ or qmasterW
- Make sure you can use "qc"
- Execute "
/srv/qroot/scripts/psuRevive.sh BP PSU
"
The script retrieves the status of all PSUs an the affected BP, shuts down the PSU, sleeps 30 seconds, turns the PSU, sleeps again and retrieves the status again. The output is saved to a file named "psu-{j,w}-<BP>-<PSU>.<DATE>.log
". Please save the file and forward it to an admin.