[HECnet] Thousands of DECnet errors on Tops-20

Peter Lothberg roll at stupi.com
Tue Jan 12 12:29:53 PST 2021


The DECnet segment size has to be the same "network wide". 

If I remember right DECnet looks at the two end nodes and uses the smalles segment size, 
so if there is any transit node in the path with a small segment size things will not work as 
it will drop packets bigger than it''s size. 

The only SW/HW combination I knew of that has other than 576 is MRC/Stu DECnet for 
Tops20 4.x on DEC2020. 

-P 

> From: "tommytimesharing" <tommytimesharing at gmail.com>
> To: "hecnet" <hecnet at Update.UU.SE>
> Sent: Monday, January 11, 2021 11:58:56 PM
> Subject: Re: [HECnet] Thousands of DECnet errors on Tops-20

> Yes, I had seen this and had wondered about it after I had reflected on the
> output of a SHOW EXECUTOR CHARACTERISTICS command(clipped)
>> Executor Node = 2.520 (TOMMYT)

>> Identification = Tommy Timesharing
>> Management Version = 4.0.0
>> CPU = DECSYSTEM1020
>> Software Identification = Tops-20 7.1 PANDA
>>>> .
>>>> .
>>>> .
>> Buffer Size = 576
>> Segment Buffer Size = 576

> So it would appear that the 20's implementation of NICE knows of this
> differentiation. I can parse for both SET EXECUTOR SEGMENT BUFFER SIZE and SET
> EXECUTOR BUFFER SIZE . Both fail, of course; again, once DECnet is initialized,
> they are locked.

> However, when one looks at the DECnet initialization block ( IBBLK ), it only
> contains a field for buffer size ( IBBSZ ), nothing about segment size.
> Further, the NODE% JSYS ' set DECnet initialization parameters function (
> .NDPRM ) only contains a sub-function for buffer size ( .NDBSZ ) and SETSPD
> will only parse for DECNET BUFFER-SIZE . I'm hopeful to test that this weekend
> after I've looked further through the error log.

> The receive code in the low level NI driver ( PHYKNI ) only checks to see
> whether was was received will fit into the buffer specified. It returns a
> length error ( UNLER% ) to DNADLL , but not the actual difference.

> I have yet to puzzle out how the segment size is derived, but it is apparently
> set on a line basis.

>> On 1/11/21 8:24 PM, Johnny Billquist wrote:

>> Thomas, I wonder if you might experience the effects of that ethernet packet
>> size might be different than the DECnet segment buffer size.
>> This is a little hard to explain, as I don't have all the proper DECnet naming
>> correct.

>> But, based on RSX, there is two sizes relevant. One is the actual buffer size
>> the line is using. The other is the DECnet segment buffer size.

>> The DECnet segment buffer size is the maximum size of packets you can ever
>> expect DECnet itself to ever use.
>> However, at least with RSX, when it comes to the exchange of information at the
>> line level, which includes things like hello messages, RSX is actually using a
>> system buffer size setting, which might be very different from the DECnet
>> segment buffer size.

>> I found out that VMS have a problem here in that if the hello packets coming in
>> are much larger than the DECnet segment buffer size, you never even get
>> adjacency up, while RSX can deal with this just fine.

>> It sounds like you might be seeing something similar in Tops-20. In which case
>> you would need to tell the other end to reduce the size of these hello and
>> routing information packets for Tops-20 to be happy, or else find a way to
>> accept larger packets.

>> After all, ethernet packets can be up to 1500 bytes of payload.

>> And to explain it a bit more from an RSX point of view. RSX will use the system
>> buffer size when creating these hello messages. So, if that is set to 1500, you
>> will get hello packets up to 1500 bytes in size, which contain routing vectors
>> and so on.

>> But actual DECnet communication will be limited to what the DECnet segment
>> buffer size say, so once you have adjacency up, when a connection is
>> established between two programs, those packets will never be larger than the
>> DECnet segment buffer size, which is commonly 576 bytes.

>> Johnny

>>> On 2021-01-11 23:43, Thomas DeBellis wrote:

>>> Paul,

>>> Lots of good information. For right now, I did an experiment and went into MDDT
>>> and stubbed out the XWD UNLER%,^D5 entry in the NIEVTB: table in the running
>>> monitor on VENTI2. Since then (about an hour or so ago), TOMMYT 's ERROR.SYS
>>> file has been increasing as usual (a couple of pages an hour) while VENTI2's
>>> hasn't changed at all. So that particular fire hose is plugged for the time
>>> being.

>>> I don't believe I have seen this particular error before, however, there are
>>> probably some great reasons for that. In the 1980's, CCnet may not have had
>>> Level-2 routers on it while Columbia's 20's were online. We did have a problem
>>> with the 20's complaining about long Ethernet frames from an early version BSD
>>> 4.2 that was being run on some VAX 11/750's in the Computer Science
>>> department's research lab. They got taught how to not do that and all was well.

>>> Tops-20's multinet implementation was first done at BBN and then later imported.
>>> I am not sure that it will allow me to change the frame size. 576 was what was
>>> used for the Internet, so I don't know where that might be hardwired. I'll
>>> check.

>>> I think there are two forensics to perform here:

>>> 1. Investigate when the errors started happening; whether they predate
>>> Bob adopting PyDECnet
>>> 2. Investigate what the size difference is; I don't believe that is
>>> going into the error log, but I'll have to look more carefully with
>>> SPEAR.

>>> A *warning* for anyone also looking to track this down: if you do the retrieve
>>> in SPEAR on KLH10 and you don't have have my time out changes for DTESRV, you
>>> will probably crash your 20. This will happen both with a standard DEC monitor
>>> and PANDA.

>>>> ------------------------------------------------------------------------
>>>> On 1/11/21 4:41 PM, Paul Koning wrote:

>>>>> On Jan 11, 2021, at 4:22 PM, Thomas DeBellis [ mailto:tommytimesharing at gmail.com
>>>>> | <tommytimesharing at gmail.com> ] wrote:

>>>>> OK, I guess that's probably a level 2 router broadcast coming over the bridge.
>>>>> There is no way Tops-10 or Tops-20 could currently be generating that because
>>>>> there is no code to do so; they're level 1, only

>>>> Yes, unfortunately originally both multicasts used the same address. That was
>>>> changed in Phase IV Plus, but that still sends to the old address for backwards
>>>> compatibility and it isn't universally implemented.

>>>>> I started looking at the error; it starts out in DNADLL when it is detected on a
>>>>> frame that has come back from NISRV (the Ethernet Interface driver). The error
>>>>> is then handed off to NTMAN where the actual logging is done. So, there are two
>>>>> quick hacks to stop all the errors:

>>>>> • I could stub out the length error entry (XWD UNLER%,^D5) in the NIEVTB: table
>>>>> in DNADLL.MAC.
>>>>> • I could put in a filter ($NOFIL) for event class 5 in the NMXFIL: table in
>>>>> NTMAN.MAC.

>>>>> That will stop the deluge for the moment. Meanwhile, I have to understand what's
>>>>> actually being detected; even the full SPEAR entry is short on details (like
>>>>> how long the frame was).

>>>> The thing to look for is the buffer size (frame size) setting of the stations on
>>>> the Ethernet. It should match; if not someone may send a frame small enough by
>>>> its settings but too large for someone else who has a smaller value. Routing
>>>> messages tend to cause that problem because they are variable length; the Phase
>>>> IV rules have the routers send them (the periodic ones) as large as the line
>>>> buffer size permits.

>>>> Note that DECnet by convention doesn't use the full max Ethernet frame size in
>>>> DECnet, because DECnet has no fragmentation so the normal settings are chosen
>>>> to make for consistent NSP packet sizes throughout the network. The router
>>>> sending the problematic messages is 2.1023 (not 63.whatever, Rob, remember that
>>>> addresses are little endian) which has its Ethernet buffer size set to 591.
>>>> That matches the VMS conventional default of 576 when accounting for the "long
>>>> header" used on Ethernet vs. the "short header" on point to point (DDCMP etc.)
>>>> links). But VENTI2 has its block size set to 576. If you change it to 591 it
>>>> should start working.

>>>> Perhaps I should change PyDECnet to have a way to send shorter than max routing
>>>> messages.

>>>> paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sonic.net/pipermail/hecnet-list/attachments/20210112/ead70320/attachment.htm>


More information about the Hecnet-list mailing list