[HECnet] Tops-20 SETNOD Failure

Thomas DeBellis tommytimesharing at gmail.com
Tue May 4 19:31:36 PDT 2021


Personally, I don't see how it could /possibly/ be anything to do with 
the REACH:: node definition, but I have been known to occasionally 
overlook the utterly obvious, particularly when it's near night-night.  
Maybe not this time.

Right now, the way to figure it out is to get the minor error data and 
see where that takes things.  So I'm making a change to JNTMAN to have 
.NDINT to return the lower level code on an incomplete insert. SCLINK 
appears to have a problem that it is mangling return values, which I'm 
currently investigating.

You can't just blithely assuming somebody got it wrong and 'fix' things; 
sometimes it's a certain way for a reason.

On 5/4/21 8:46 PM, Johnny Billquist wrote:
> On 2021-05-05 00:54, Mike Kostersitz wrote:
>> Ouch that is one of my nodes 😊 @Johnny Billquist 
>> <mailto:bqt at softjar.se> anything you could think of since we just 
>> renamed my old RSX11M node to REACH.
>
> Well. It is something slightly broken in Tops-20, so there isn't 
> really anything we can do about it.
>
> Except hope that Thomas can figure it out and fix it.
>
>  Johnny
>
>>
>> Mike
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for 
>> Windows 10
>>
>> *From: *Thomas DeBellis <mailto:tommytimesharing at gmail.com>
>> *Sent: *Tuesday, May 4, 2021 15:16
>> *To: *HECnet <mailto:hecnet at update.uu.se>
>> *Subject: *[HECnet] Re: Tops-20 SETNOD Failure
>>
>> I fixed a few things in SETNOD to get some more information about the 
>> error.  In particular,
>>
>>   * Allow listing of AREA 1 (this was specifically disallowed, I don't
>>     know why)
>>   * More consistent error reporting (via ESOUT%)
>>   * List more than one node when doing an area list (it would only list
>>     a single node)
>>   * List nodes with more than three digits in the node number when doing
>>     columnar output
>>
>> So now you get the expected results:
>>
>>     SETNOD>lis a 1
>>     [Area 1]
>>     A1RTR   1023    ATHENA   620    ATLE     605    AURORA 606       
>> BANAI    770
>>     BANX25   771    BEA       19    BIZET    800    BJARNE 7       
>> BLINKY   266
>>     CATWZL   302    CLYDE    269    COOPER   263    CRISPS 201       
>> CYGNUS   259
>>     DAVROS   254    DBIT     351    DE1RSX   450    DE1RSY 452       
>> DOCTOR   252
>>     ELIN     616    ELMER    617    ERNIE      2    ERSATZ 350       
>> FLETCH   100
>>     FNATTE     3    FREJ     608    GAXP     730    GNAT 16       
>> GNOME      6
>>     GOBLIN     4    GVAX     731    HAGMAN   262    HARPER 261       
>> HORSE    150
>>     HUGIN    602    HYUNA    500    INKY     268    JIMIN 501       
>> JOCKE     21
>>     JOSSE     17    KLIO     451    KRILLE     8    LOKE 607       
>> MACARO   303
>>     MACRA    258    MAGICA     1    MASTER   251    MIM 13       
>> MUNIN    603
>>     NIPPER   202    NOMAD    610    NOXBIT   720    ORACLE 301       
>> PACMAN   265
>>     PAI      541    PALLAS   621    PAMINA    18    PIDP11 560       
>> PINKY    267
>>     PISTON   520    PLINTH   200    PMAVS2   510    PONDUS 15       
>> PONY      12
>>     PUFF      22    QEMUNT   151    REI      540    ROCKY 11       
>> ROJIN    542
>>     RSX124   306    RSX145   304    RSX170   305    RSX184 307       
>> RUTAN    255
>>     SHARPE   260    SIDRAT   253    SIGGE     10    SPEEDY 24       
>> TARDIS   250
>>     TEMPO      9    THOROS   257    TINA      14    TIPSY 604       
>> TONGUE   264
>>     TOPSY    601    VALAR    400    VAROS    256    WXP 20       
>> WXP2      23
>>     YMER     609    ZEKE       5
>>     Total nodes in area 1: 92
>>     SETNOD>exit
>>
>> Regarding the error, I have reproduced it with a single entry, viz:
>>
>>     !setnod
>>     SETNOD>_set nod 2.298 name REACH_
>>     SETNOD>_insert_
>>     ?SETNOD: Failed at node REACH (2.298), Item 0 of 1
>>     SETNOD>
>>
>> The high level code to do the entry is in JNTMAN.  It loops through 
>> the table passed to it via .NDINT, calling a lower level routine 
>> called SCTAND in SCLINK.  An error here is passed up to JNTMAN, but 
>> it is not passed back to the user.  There are some other problems in 
>> SCLINK pertaining to negative return values, so some minor work is 
>> necessary there, also.
>>
>> I'll make some changes to these two modules, generate a new monitor 
>> for VENTI2 and see what happens in a few days.
>>
>> Right now, if any Tops-20 using is using SETNOD to update DECnet 
>> tables, this appears to fail.  If anybody else is seeing it or can 
>> reproduce it, I'd like to hear about it.
>>
>>     On 5/4/21 11:15 AM, Thomas DeBellis wrote:
>>
>>     Has anybody ever seen SETNOD fail to insert the entire node list?  I
>>     just did.
>>
>>     Shortly after I put my 20's up on HECnet, I wrote a reoccurring
>>     batch job that fires once a week on Sundays to pull the latest node
>>     list (T20.FIX) from MIM::.  I use the highly venerable FILCOM
>>     program to do a difference of it with the previous week's list.  I
>>     don't do anything in particular with the output except save it in
>>     case I feel like looking at it for some reason.
>>
>>     The batch job always inserts the entire list, rewriting whatever
>>     might be in the monitor's data base.  I have always been unsatisfied
>>     with doing things that way because it seemed to me to be inefficient
>>     as the node list grew.   The HECnet node list count was 716 on
>>     9-Jun-19 and it's now up to 884 as of the latest version that I've
>>     pulled, 30-Apr-21.  The other problem is the microscopic possibility
>>     that a node is in Tops-20's monitor database (a hash table) that
>>     isn't in the HECnet node list.
>>
>>     Nodes can get removed, although I think that infrequent. Nodes
>>     could get inserted outside of the batch job, but I think that most
>>     unlikely in my situation.  Nodes can get renamed, as evidenced by
>>     2.299 below, which went from THEPIT to THEARK.  None of this should
>>     or has broken anything.
>>
>>     However, it's been in the back of my mind to do two enhancements,
>>     one to Tops-20 and one to SETNOD.  The NODE% JSYS should have an
>>     additional feature to return the current monitor data base. The
>>     SETNOD program should be enhanced to take that to compute the set
>>     difference with the new list.  This would show additions, renames
>>     and deletions.  That would bring the update operation down from some
>>     hundred items to less than ten, on average.  This would obviously
>>     make more of a difference on huge DECnet's in the tens of thousands
>>     of nodes.  Another NODE% feature should probably be to whack the
>>     entire monitor database except for the local node, which would be
>>     useful for trouble shooting.
>>
>>     Last Sunday, the batch job failed with the following error:
>>
>>     18:33:40 USER   SETNOD>*TAKE SYSTEM:NODE-DATA.TXT.0
>>     18:33:40 USER
>>     18:33:40 USER   [Fork SETNOD opening <SYSTEM>NODE-DATA.TXT.1 for
>>     reading]
>>     18:33:41 USER   SETNOD>*SAVE
>>     18:33:41 USER
>>     18:33:41 USER   [Fork SETNOD opening <SYSTEM>NODE-DATA.BIN.74 for
>>     reading, writing]
>>     18:33:41 USER   SETNOD>*INSERT
>>     18:33:41 USER
>>     18:33:41 USER *?SETNOD: Failed at node REACH*
>>     18:33:41 USER   SETNOD>
>>
>>     I had a look at the SETNOD source and the HECnet node list and have
>>     discovered and concluded a few things.  First, there doesn't seem to
>>     be anything syntactically wrong with REACH::'s definition: "set nod
>>     2.298 name REACH".  Second, there don't appear to be any semantic
>>     issues.  2.298 wasn't in use and it shouldn't matter if it was.
>>
>>     In the case of INSERT, there are two kinds of errors from NODE%, a
>>     general failure of the JSYS and an incomplete insertion. The error
>>     is from the second case.  Unfortunately, SETNOD isn't reporting
>>     enough information about the error, so I have to make some changes
>>     there.  It's also possible that SETNOD is building an inconsistent
>>     database for the monitor to swallow; at least the LIST command is
>>     giving me some odd results, viz:
>>
>>         SETNOD>list arEA 2
>>
>>         [AREA 2]
>>         A2RTR
>>
>>         TOTAL NODES FOUND: 1
>>
>>         SETNOD>
>>
>>     That's clearly wrong, viz:
>>
>>         !i dec
>>           Local DECNET node: VENTI2.  Nodes reachable: 7.
>>           Accessible DECNET nodes are:    A2RTR    BOINGO LEGATO   
>>         TOMMYT    VENTI2    VENTI    ZITI
>>
>>     The Exec output should probably be changed to say, "Nodes reachable
>>     in local area" and "Online nodes in area are:"
>>
>>     Anybody have any ideas?  Hunches?  Clues?
>>
>> File 1) OLDF:[4,120]    created: 1241 15-Apr-21
>> File 2) NEWF:[1,1]      created: 0102 30-Apr-21
>>
>> 1)1     set nod 44.9 name OSMIUM
>> ****
>> 2)1     set nod 2.292 name OSIRIS
>> 2)      set nod 44.9 name OSMIUM
>> **************
>> 1)1     set nod 13.3 name RED
>> ****
>> 2)1 *set nod 2.298 name REACH *
>> 2)      set nod 13.3 name RED
>> **************
>> 1)1     set nod 2.298 name RSX11M
>> 1)      set nod 1.306 name RSX124
>> ****
>> 2)1     set nod 1.306 name RSX124
>> **************
>> 1)1     set nod 42.5 name SPARKY
>> ****
>> 2)1     set nod 2.291 name SPARK
>> 2)      set nod 42.5 name SPARKY
>> **************
>> 1)1     set nod 2.299 name THEPIT
>> 1)      set nod 35.70 name THOMAS
>> ****
>> 2)1     set nod 2.299 name THEARK
>> 2)      set nod 35.70 name THOMAS
>> **************
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sonic.net/pipermail/hecnet-list/attachments/20210504/79dd342c/attachment-0001.htm>


More information about the Hecnet-list mailing list