[HECnet] Tops-20 SETNOD Failure

Thomas DeBellis tommytimesharing at gmail.com
Tue May 4 15:15:36 PDT 2021


I fixed a few things in SETNOD to get some more information about the 
error.  In particular,

  * Allow listing of AREA 1 (this was specifically disallowed, I don't
    know why)
  * More consistent error reporting (via ESOUT%)
  * List more than one node when doing an area list (it would only list
    a single node)
  * List nodes with more than three digits in the node number when doing
    columnar output

So now you get the expected results:

    SETNOD>lis a 1
    [Area 1]
    A1RTR   1023    ATHENA   620    ATLE     605    AURORA 606   
    BANAI    770
    BANX25   771    BEA       19    BIZET    800    BJARNE 7    BLINKY  
    266
    CATWZL   302    CLYDE    269    COOPER   263    CRISPS 201   
    CYGNUS   259
    DAVROS   254    DBIT     351    DE1RSX   450    DE1RSY 452   
    DOCTOR   252
    ELIN     616    ELMER    617    ERNIE      2    ERSATZ 350   
    FLETCH   100
    FNATTE     3    FREJ     608    GAXP     730    GNAT 16    GNOME      6
    GOBLIN     4    GVAX     731    HAGMAN   262    HARPER 261   
    HORSE    150
    HUGIN    602    HYUNA    500    INKY     268    JIMIN 501   
    JOCKE     21
    JOSSE     17    KLIO     451    KRILLE     8    LOKE 607    MACARO  
    303
    MACRA    258    MAGICA     1    MASTER   251    MIM 13    MUNIN    603
    NIPPER   202    NOMAD    610    NOXBIT   720    ORACLE 301   
    PACMAN   265
    PAI      541    PALLAS   621    PAMINA    18    PIDP11 560   
    PINKY    267
    PISTON   520    PLINTH   200    PMAVS2   510    PONDUS 15   
    PONY      12
    PUFF      22    QEMUNT   151    REI      540    ROCKY 11    ROJIN   
    542
    RSX124   306    RSX145   304    RSX170   305    RSX184 307   
    RUTAN    255
    SHARPE   260    SIDRAT   253    SIGGE     10    SPEEDY 24   
    TARDIS   250
    TEMPO      9    THOROS   257    TINA      14    TIPSY 604   
    TONGUE   264
    TOPSY    601    VALAR    400    VAROS    256    WXP 20    WXP2      23
    YMER     609    ZEKE       5
    Total nodes in area 1: 92
    SETNOD>exit

Regarding the error, I have reproduced it with a single entry, viz:

    !setnod
    SETNOD>_set nod 2.298 name REACH_
    SETNOD>_insert_
    ?SETNOD: Failed at node REACH (2.298), Item 0 of 1
    SETNOD>

The high level code to do the entry is in JNTMAN. It loops through the 
table passed to it via .NDINT, calling a lower level routine called 
SCTAND in SCLINK.  An error here is passed up to JNTMAN, but it is not 
passed back to the user.  There are some other problems in SCLINK 
pertaining to negative return values, so some minor work is necessary 
there, also.

I'll make some changes to these two modules, generate a new monitor for 
VENTI2 and see what happens in a few days.

Right now, if any Tops-20 using is using SETNOD to update DECnet tables, 
this appears to fail.  If anybody else is seeing it or can reproduce it, 
I'd like to hear about it.

> ------------------------------------------------------------------------
>
> On 5/4/21 11:15 AM, Thomas DeBellis wrote:
>
> Has anybody ever seen SETNOD fail to insert the entire node list?  I 
> just did.
>
> Shortly after I put my 20's up on HECnet, I wrote a reoccurring batch 
> job that fires once a week on Sundays to pull the latest node list 
> (T20.FIX) from MIM::.  I use the highly venerable FILCOM program to do 
> a difference of it with the previous week's list.  I don't do anything 
> in particular with the output except save it in case I feel like 
> looking at it for some reason.
>
> The batch job always inserts the entire list, rewriting whatever might 
> be in the monitor's data base.  I have always been unsatisfied with 
> doing things that way because it seemed to me to be inefficient as the 
> node list grew.   The HECnet node list count was 716 on 9-Jun-19 and 
> it's now up to 884 as of the latest version that I've pulled, 
> 30-Apr-21.  The other problem is the microscopic possibility that a 
> node is in Tops-20's monitor database (a hash table) that isn't in the 
> HECnet node list.
>
> Nodes can get removed, although I think that infrequent.  Nodes could 
> get inserted outside of the batch job, but I think that most unlikely 
> in my situation.  Nodes can get renamed, as evidenced by 2.299 below, 
> which went from THEPIT to THEARK.  None of this should or has broken 
> anything.
>
> However, it's been in the back of my mind to do two enhancements, one 
> to Tops-20 and one to SETNOD. The NODE% JSYS should have an additional 
> feature to return the current monitor data base.  The SETNOD program 
> should be enhanced to take that to compute the set difference with the 
> new list.  This would show additions, renames and deletions.  That 
> would bring the update operation down from some hundred items to less 
> than ten, on average.  This would obviously make more of a difference 
> on huge DECnet's in the tens of thousands of nodes.  Another NODE% 
> feature should probably be to whack the entire monitor database except 
> for the local node, which would be useful for trouble shooting.
>
> Last Sunday, the batch job failed with the following error:
>
> 18:33:40 USER   SETNOD>*TAKE SYSTEM:NODE-DATA.TXT.0
> 18:33:40 USER
> 18:33:40 USER   [Fork SETNOD opening <SYSTEM>NODE-DATA.TXT.1 for reading]
> 18:33:41 USER   SETNOD>*SAVE
> 18:33:41 USER
> 18:33:41 USER   [Fork SETNOD opening <SYSTEM>NODE-DATA.BIN.74 for 
> reading, writing]
> 18:33:41 USER   SETNOD>*INSERT
> 18:33:41 USER
> 18:33:41 USER *?SETNOD: Failed at node REACH*
> 18:33:41 USER   SETNOD>
>
> I had a look at the SETNOD source and the HECnet node list and have 
> discovered and concluded a few things.  First, there doesn't seem to 
> be anything syntactically wrong with REACH::'s definition: "set nod 
> 2.298 name REACH". Second, there don't appear to be any semantic 
> issues.  2.298 wasn't in use and it shouldn't matter if it was.
>
> In the case of INSERT, there are two kinds of errors from NODE%, a 
> general failure of the JSYS and an incomplete insertion.   The error 
> is from the second case.  Unfortunately, SETNOD isn't reporting enough 
> information about the error, so I have to make some changes there.  
> It's also possible that SETNOD is building an inconsistent database 
> for the monitor to swallow; at least the LIST command is giving me 
> some odd results, viz:
>
>     SETNOD>list arEA 2
>
>     [AREA 2]
>     A2RTR
>
>     TOTAL NODES FOUND: 1
>
>     SETNOD>
>
> That's clearly wrong, viz:
>
>     !i dec
>      Local DECNET node: VENTI2.  Nodes reachable: 7.
>      Accessible DECNET nodes are:    A2RTR    BOINGO LEGATO   
>     TOMMYT    VENTI2    VENTI    ZITI
>
> The Exec output should probably be changed to say, "Nodes reachable in 
> local area" and "Online nodes in area are:"
>
> Anybody have any ideas?  Hunches?  Clues?
>
> ------------------------------------------------------------------------
>
> File 1) OLDF:[4,120]    created: 1241 15-Apr-21
> File 2) NEWF:[1,1]      created: 0102 30-Apr-21
>
> 1)1     set nod 44.9 name OSMIUM
> ****
> 2)1     set nod 2.292 name OSIRIS
> 2)      set nod 44.9 name OSMIUM
> **************
> 1)1     set nod 13.3 name RED
> ****
> 2)1 *set nod 2.298 name REACH *
> 2)      set nod 13.3 name RED
> **************
> 1)1     set nod 2.298 name RSX11M
> 1)      set nod 1.306 name RSX124
> ****
> 2)1     set nod 1.306 name RSX124
> **************
> 1)1     set nod 42.5 name SPARKY
> ****
> 2)1     set nod 2.291 name SPARK
> 2)      set nod 42.5 name SPARKY
> **************
> 1)1     set nod 2.299 name THEPIT
> 1)      set nod 35.70 name THOMAS
> ****
> 2)1     set nod 2.299 name THEARK
> 2)      set nod 35.70 name THOMAS
> **************
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sonic.net/pipermail/hecnet-list/attachments/20210504/c4db3cd1/attachment.htm>


More information about the Hecnet-list mailing list