[HECnet] Effects of Rogue Duplicate HECnet Node?

Tue Mar 3 13:45:12 PST 2020

As I mentioned, seeing the same address on two circuits is perfectly normal and expected if that node is a router.  

For an endnode, it would be possible to create some hacks that would help.  They aren't in PyDECnet because that implements, just about exactly, the letter of the DNA specs, and no such checks are in those specs.

If a given router were to see the same address as a neighbor ("adjacency") on multiple circuits, and the hello messages indicate it's an endnode, then that would be an error.  

But the picture isn't so clear if an endnode appears as an adjacency while that address is also shown as reachable (but not adjacent, i.e., hop count > 1) in the routing tables.  The reason is the "counting up" behavior of Phase III/IV routing that is characteristic of "distance vector" algorithms.  When a node goes down and its adjacency times out, that node is still known as reachable by others in the area.  The former neighbor now thinks the downed node is 3 hops away.  It reports that change, which causes other routers to revise their distance, and the process counts up by one or two hops at a time until you hit "max hops" and the node is now declared unreachable.  If you have events enabled, you can see this clearly: an adjacency down results in a node unreachable, but the unreachable event is delayed by 10-20 seconds -- or longer, possibly significantly longer, if there are lots of routers.  Keep in mind that routers only send updates at most once a second, to prevent floods of change messages.  That reduces overhead but also puts a time bound on how fast a router can know a node has gone away.

The result: if I see an endnode come up, it might be that it was recently up, either on this node or elsewhere, and it hasn't counted up to unreachable yet.  Given the protocol rules this is unavoidable.

It would be possible to have a configuration setting that holds off acting on a new adjacency until the count-up process has had a chance to work.  If after a suitable delay the address still appears reachable, that would be an error.  This means, of course, that a node whose circuit bounced will be unavailable for a while longer.

Phase V avoids this issue entirely by using link state routing, essentially a map algorithm, so there you can see duplicate addresses directly.

I'm puzzled by the bit about MacOS PATHworks detecting duplicate address.  How would it do that?  As an end node, it doesn't see routing messages at all.  It could see a duplicate address on its own Ethernet, by detecting the receipt of a message from its Ethernet address that it didn't send.  But I can see no way it could know about an off-LAN duplicate.

	paul

> On Mar 3, 2020, at 3:33 PM, Supratim Sanyal <supratim at riseup.net> wrote:
> 
> Thinking more ... it is difficult to run Wireshark 24/7 trying to trap the rogue node's source. But PYDNET - a DECnet/python rev 486 node - does L1 routing for area 31. Wouldn't PYDNET know that the address for MACOS9 is taken and that node is up, and therefore another node with the same address trying to come up is likely a situation to be at the least logged ? Knowing Paul, it probably already does, it has not struck me to look at PYDNET logs before.
> 
> 
> ---
> Supratim Sanyal, W1XMT
> 39.19151 N, 77.23432 W
> QCOCAL::SANYAL via HECnet <http://www.update.uu.se/~bqt/hecnet.html>
> 
> 
> On Mar 3, 2020, at 3:13 PM, Supratim Sanyal <supratim at riseup.net <mailto:supratim at riseup.net>> wrote:
> 
>> I have had someone on area 31 bring up a node that conflicts with my MACOS9 node running Pathworks for Macintosh. The Mac handles it by turning it's executor off and throwing a error popup saying "someone grabbed my address".
>> 
>> Since I am not sitting there staring at the Mac screen all the time, the situation always has gone away by the time I get to investigating why MACOS9 dropped off.
>> 
>> This makes me wonder if whoever has that other node gets a similar message and disconnects. But that still leaves MACOS9 in limbo.
>> 
>> 
>> ---
>> Supratim Sanyal, W1XMT
>> 39.19151 N, 77.23432 W
>> QCOCAL::SANYAL via HECnet <http://www.update.uu.se/~bqt/hecnet.html>
>> 
>> 
>> On Mar 3, 2020, at 2:16 PM, Paul Koning <paulkoning at comcast.net <mailto:paulkoning at comcast.net>> wrote:
>> 
>>> 
>>> 
>>>> On Mar 3, 2020, at 1:13 PM, Thomas DeBellis <tommytimesharing at gmail.com <mailto:tommytimesharing at gmail.com>> wrote:
>>>> 
>>>> You may be talking about a number of things here.  DECnet node numbers are something (very) vaguely like IP tuples, except with half the bits and fixed fields.  The upper 6 bits constitute the area, the lower 10 bits constitute the number within area.  This is what I recall:
>>>> 
>>>>    • If the node number's name is not defined to other systems, then many user level programs will not be able to see if.  Tops-20 won't able to build a connection.
>>> 
>>> Interesting.  It depends on the application API.  For example, in RSTS the "node name" argument can contain a number in string form, which lets you connect by address.  But in some places in NCP things don't work if there isn't a name for the node.  I would call that a bug.
>>> 
>>>>        • Phase II DECnet used node names directly, I think.
>>> 
>>> Yes, though it also mentions node addresses.  The spec requires a value between 2 and 240, with no explanation why.  The address appears in the Node Init message, but nowhere else that I can see.
>>> 
>>>>    • If the number is the same as another system in different area, then everything is fine except for 1.
>>> 
>>> That isn't a duplicate address.  The address is a 16 bit value, not a 6 or 10 bit value.  If some of the bits are the same but others aren't, you have two different addresses.
>>> 
>>>>    • If the number is the same as another system in the same area, then somebody will become 'unhappy'.
>>>>        • I don't remember how the adjacency is reported for point-to-point.
>>>>    • If you think of MAC address clash on the same Ethernet segment as opposed to different segments, you may appreciate a similarity.
>>> 
>>> Two nodes on the same Ethernet (not just segment but bridged also) will result in a duplicate Ethernet address.  DECnet doesn't define anything that checks for this.  Depending on the implementation, you might see it as an "adjacency" to your own node address on a circuit.  The same issue appears if you have a router with multiple Ethernet interfaces and you attach those to the same Ethernet.  Phase V of course fixes this by not using a MAC address derived from the node address.  So does Phase IV Prime, but implementations of that are rare at best.
>>> 
>>> If the multiple nodes are connected to point to point links, or to disjoint Ethernets, then as far as DECnet is concerned that's just one node reachable via several paths.  That ability is of course intentional -- a router can be reached on any of its interfaces.  A duplicate address essentially looks like a partitioned node.  Other nodes would see one or the other of the two, depending on which is closer (by path cost).
>>> 
>>>>    • I don't remember the finer details of the differences between a level 1 and 2 router.
>>> 
>>> If one of the duplicate-address nodes is a level 2 router but the other isn't, then the one that is will show in entry 0 of the routing tables as a possible "nearest L2 router".  That will work just fine.
>>> 
>>> If both are level 2 routers, then for the out of area routing they just act as redundant L2 routers offering out of area service.  The usual rule that an area must not be partitioned applies, of course.
>>> 
>>> That brings up a particularly nasty case.  Suppose you have an L2 router that duplicates someone else's address, and in fact it isn't connected into the area its address says it belongs in.  The scenario of "I accidentally booted up an old node" could do this.  If the rogue node isn't connected to other L2 nodes, that's benign because its L2 services will be turned off -- an L2 router only offers out of area service if it has out of area circuits that are up.  But if the rogue happens to be connected to some other L2 router, then it would claim connectivity to its area in its L2 routing messages.  That would make the entire area (except the rogue node itself) effectively unreachable to anyone who has a lower cost L2 path to the rogue than to the real area.
>>> 
>>>    paul
>>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sonic.net/pipermail/hecnet-list/attachments/20200303/80ad1aa6/attachment.html>