[HECnet] Thousands of DECnet errors on Tops-20

Mon Jan 18 09:39:35 PST 2021

No, 1478 doesn't make any sense.

Looking at my code, I use a local buffer size on Ethernet of 591.  But then I track what buffer sizes are reported by the router neighbors in their hello messages, and limit the routing message size to the smallest of all these numbers.

Then I noticed I subtract 16 from that when calculating the size of the update messages.  Why that is I don't remember.  

So in any case, 1478 should never be a routing protocol message size coming out of PyDECnet.

I'd like to see messages traces.  A trace level log from A2RTR would do the job.  Somthing is very strange here.

	paul

> On Jan 18, 2021, at 1:54 AM, Johnny Billquist <bqt at softjar.se> wrote:
> 
> Thomas, this is pretty much exactly what I expected (and I suspect Paul expected as well).
> 
> The level 1 routing messages are (as we said) the ones that can grow big. And the advertised length are not used by the other side to limit what they send. It essentially hints how large messages you send.
> 
> And Paul also noted that on ethernet the Python code is using larger buffer size (essentially the size an ethernet frame can be) instead of putting any lower limit on it. While this is perfectly legal from a protocol point of view, both TOPS-20 and VMS, it would seem, can't really control the size of the low layer buffer, and therefore fails if you use large packets without also having a large DECnet segment buffer size.
> 
> So Paul's PyDECnet works the same as I have managed to have RSX work here. And you get the same problem towards some OSes.
> 
> The obvious, and easy fix is to just lower the buffer size used over ethernet to more closely match what the DECnet segment buffer size is.
> 
> The sad thing with that is that, at least for RSX, it means you run the risk of hanging the ethernet when running TCP/IP. The best would be if all OSes could separate the two buffer sizes properly.
> But I just realized that I might just hack RSX DECnet here, to not use the large buffer size for the link messages... Hmm... Gotta look into this.
> 
> Meanwhile, the fix that Paul already mentioned that he has prepared and ready should fix this for you.
> 
> Alternatively, if you change that 1504-%RTEHS to instead actually say something like 1500, or 1504, you should probably also be good. (My guess would be 1500.)
> 
>  Johnny
> 
> On 2021-01-18 04:45, Thomas DeBellis wrote:
>> I think I may have finally gotten to the bottom of this.  It's a level 1 routing message that I'm getting from 2.1023 (A2RTR) that does not appear to be respecting lengths, viz:
>> *22:04:30*.749823 aa:00:04:00:ff:0b > ab:00:00:03:00:00, ethertype DN (0x6003), length *1478*: lev-1-routing src 2.1023 {ids 0-726 cost 0 hops 0
>> This is two (2) bytes over the maximum that Tops-20 can accept.
>>    NCP>*SHOW LINE NI-0 CHARACTERISTICS *
>>    NCP>
>>    22:16:04     NCP
>>    Request # 23; Show Line Characteristics Completed
>>    Line = NI-0
>>       Receive Buffers = 6
>>       Controller = Normal
>>       Protocol = Ethernet
>>       Hardware Address = 00 1F 16 EC CE 47
>>       Receive buffer size = *1476*
>> It would appear that the 20's are advertising this length in their layer 1 hello messages:
>> 22:04:21.018507 aa:00:04:00:0a:0a > ab:00:00:03:00:00, ethertype DN (0x6003), length 60: router-hello l1rout vers 2 eco 0 ueco 0 src 2.522 blksize *1476* pri 5 hello 15
>> 22:04:21.082680 aa:00:04:00:08:0a > ab:00:00:03:00:00, ethertype DN (0x6003), length 60: router-hello l1rout vers 2 eco 0 ueco 0 src 2.520 blksize *1476* pri 5 hello 15
>> About two seconds after the message comes in from A2RTR, the following appears in the error log:
>>    ***********************************************
>>    DECNET ENTRY
>>      LOGGED ON 17-Jan-2021 *22:04:32*-EST MONITOR UPTIME WAS 1 day(s)
>>    1:17:54
>>             DETECTED ON SYSTEM # 3691.
>>             RECORD SEQUENCE NUMBER: 70952.
>>    ***********************************************
>>    DECNET Event type 5.15, Receive failed
>>     From node 2.520 (TOMMYT), occurred 17-JAN-2021 22:04:08
>>       Line NI-0-0
>>       Failure reason = Frame too long
>>       Ethernet header = AB 00 00 03 00 00 / AA 00 04 00 0A 0A
>> So... no way I can get around this without some /serious/ hacking of DNADLL and ROUTER (see below), which would probably take me a few months to learn and debug.  Of course, then maybe I could put level 2 routing into Tops-20, which I been daydreaming about...
>> Paul, what does this suggest to you?
>>> ------------------------------------------------------------------------
>>> On 1/17/21 7:39 PM, Johnny Billquist wrote:
>>>> ------------------------------------------------------------------------
>>>> On 2021-01-18 00:17, Thomas DeBellis wrote:
>>>> 
>>>> Well, the frames certainly won't be larger than 1,500 bytes, right?  So I'm guessing they'll be the maximum.  Problem is, all of that stuff is hidden under several layers of drivers, so I'm not sure how I'm going to get the overage passed back.  And I also need to put in some BUGINF logic to alert if I get more of these than whatever I decide the interval to be.
>>> That depends on what they count. Like I said - ethernet payload is 1500. Then you have the ethernet headers which is 14 bytes, plus the crc trailer, which is 4 bytes. If you count them, you end up at 1518 bytes.
>>> Depends on the hardware I guess.   I have no idea what the NIA-20 expose.
>> I meant the maximum frame size; I suspect this is 1500 for the NI, but I don't actually know.  My speculation is that DECnet is using part of the buffer to piggy back node and and other information into it instead of holding this meta-data, separately.  I don't know what Multinet does, but there you can configure the NI to have a packet size of 1500.
>>>> If you are a DDP (LD.DDP), then you are not CPU dependent and you go ahead always, otherwise, you have to be on the CPU that owns the device (.CPCPN) So I'm not sure if it makes any difference, but DDP is not CPU dependent; not sure if that is a synonym for 'shared'.  If I stumble over something more, I'll report it.
>>> It's actually the same in RSX. The DDCMP layer is sort of between the hardware driver and the higher level protocols, and it's not tied to any specific CPU.
>>> 
>>> But that code would suggest that LD.DDP is just an indication of whether something is CPU dependent or not, and would have anything to do with DDCMP.
>> From looking at the routing code, seems LD.DDP is used when something is getting handed to the NSP to play with, I guess that would be goig through some kind of layering.
> 
> -- 
> Johnny Billquist                  || "I'm on a bus
>                                  ||  on a psychedelic trip
> email: bqt at softjar.se             ||  Reading murder books
> pdp is alive!                     ||  tryin' to stay hip" - B. Idol