[HECnet] More clustering fun

Peter Coghlan HECNET at beyondthepale.ie
Fri Sep 16 10:47:35 PDT 2011


I've now refreshed the VAX satellites system drive and installed it in the 
ALPHA server. The one problem I have remaining is that the VOTES the 
satellite is contributing to the cluster is 1. I believe for a proper 
satellite this should be 0.


The number of votes a node has determines what happens to it when it
loses contact with other members of the cluster. If each node in a two
node cluster has one vote, then the cluster quorum is two votes. If
something happens to either node, the other notices that quorum has been
lost and will hang until quorum is reestablished. The reason for the hang
is that all each node knows is that it can't see the other node. It doesn't
know whether the other has shut down or is still running and might become
visible again shortly, in which case, everything can resume.

If your diskless satellite should die unexpectedly, it is somewhat
irritating if it also ends up hanging the other node, for no good reason.
If the satellite has lost contact with the node with the disks, then it can
do nothing anyway. Hence the reason for recommending zero votes for diskless
satellites. No other ill effects will result from leaving votes set to one.

Whatever the number of votes each node has, when shutting down a voting
member of the cluster, REMOVE_NODE should be specified in order to avoid
hanging the rest of the cluster after the shutdown completes. This specifies
that the remaining cluster nodes should recompute quorum taking into account
the loss of the node being shut down.

Regards.
Peter Coghlan.



More information about the Hecnet-list mailing list