Fwd: Re: [HECnet] More clustering fun

Mark Wickens mark at wickensonline.co.uk
Sat Sep 17 18:30:27 PDT 2011


On 17/09/11 11:28, Peter Coghlan wrote:
My apolgies for the confusion Peter. But you're right to assume I meant the
cl.gr.nr.
And yes, there is obviously something wrong here. IF an autogen was done I'd
say have a look at modparams.dat
- alloclass must be unique for each node
- same for tapealloclass

The alloclass is used in forming device names and lock resource names. If it
is not set correctly, there will be problems in these areas but they should not
prevent a node from joining a cluster. I would tend to leave it at zero unless
there were good reasons for changing it. There are myriads of little rules about
how various sysgen parameters should be set but most of them can be ignored
for the moment, partly because VMS is not so fragile that having any one of
a vast number of parameters wrong will prevent a system from booting and partly
because the system picks sensible defaults that will work in typical cases
for sysgen parameters.

More important sysgen parameters to check are SCSNODE and particularly
SCSSYSTEMID. If SCSSYSTEMID was inadvertently changed, this may well cause
difficulties. SCSSYSTEMID must be the same as the decnet area number *1024 plus
the decnet node number. This number is used to calculate the ethernet address
used for cluster communications. It is also used to uniquely identify cluster
nodes to other cluster nodes.

If SCSNODE is changed without changing SCSSYSTEMID or vice versa, that node
will have difficulties joining a cluster as the other nodes in the cluster
will remember the previous values and complain that the new ones are not
consistent. The solution here is to shut down all nodes in the cluster so that
they are all down at the same time and then reboot each.

And check the cluster license. AFAIK the cluster license must bu unique for
each node. Or one license with 0 units and in that case make sure all
nodenames are mentioned in the /INCLUDE list, again on all nodes where the
license was loaded.

I am not 100% sure on this but as far as I know, cluster licenses are not
checked when a node is attempting to join a cluster because the system is
operating at a very low level and may not be in a position to access the disk
yet where its licenses are held. I think the response to lack of cluster
license is whinges about it in the operator log rather than disallowing a
node from joining. If it were to prevent a node from joining, I would expect
to see a prominent error message mentioning a license problem.

Regards,
Peter Coghlan.
It's all working nicely, but this is before an AUTOGEN. If I may I'll post the contents of the MODPARAMS.DAT and PARAMS.DAT to see if anyone recognises something bad.

Regards, Mark.



More information about the Hecnet-list mailing list