[HECnet] Tops-20 SETNOD Failure

Johnny Billquist bqt at softjar.se
Wed May 5 14:03:21 PDT 2021


It might not have anything in particular to do with the node number 
(2.298), but could be it being the n:th entry being poked at. Or 
possibly because it redefines an existing node.
Try clearing the node out and then define it?
Add a dummy extra entry in the file before this one is another idea.

   Johnny

On 2021-05-05 22:05, Thomas DeBellis wrote:
> I got annoyed at the thought of having to wait a few more months for the 
> error condition to show up and, instead of having the batch job run more 
> frequently (and thus beating on poor MIM::), I wrote another batch job 
> which took every single file that I have /ever/ downloaded from MIM:: 
> and inserted it.  So that's 75 files and it failed on number 54.
> 
> 15:36:51 USER   SETNOD>**T OLDS:NODE-DATA.TXT.54*
> 15:36:52 USER
> 15:36:52 USER   SETNOD>**List Total*
> 15:36:52 USER
> 15:36:52 USER
> 15:36:52 USER   TOTAL NODES FOUND: 869
> 15:36:52 USER
> 15:36:52 USER   SETNOD>**Insert*
> 15:36:52 USER
> 15:36:52 USER   ?SETNOD: Failed at node RSX11M (2.298), Item 650 of 869, 
> Error: _-11_
> 15:36:52 USER   SETNOD>
> 
> It is interesting that it is failing on node 2.298, but this is before 
> that number had been reassigned to REACH::. The negative 11 error 
> returns means "Component in Wrong State" (aka NF.CWS), which I didn't 
> find immediately informative.  However, now I've got something to look 
> around for.
> 
> I still can't imagine why there would be anything particularly 
> diabolical about the number 2.298.
>> ------------------------------------------------------------------------
>>
>> On 5/5/21 12:38 AM, Thomas DeBellis wrote:
>>
>> I finished the modifications to SCLINK to properly return error values 
>> which are negative and JNTMAN to return the error value in AC3 if 
>> .NDINT doesn't succeed inserting all the nodes.  Then I modified 
>> SETNOD to get this extended error information and print it.  I put the 
>> new monitor and SETNOD up, rebooted *…AND*…
>>
>>     SETNOD>set nod 2.298 name REACH SETNOD>ins SETNOD>
>>
>> It works perfectly because, of course it does…
>>
>> So, as usual, Johnny's guess is pretty close to the mark, even if he 
>> isn't a 36 bit'er.  "Slightly broken"?  Yeah, 'slightly' enough so 
>> that it can't be easily reproduced…
>>
>> The only thing I can think of is that the system had been up over 15 
>> weeks when I saw this.  I had looked at the storage space utilization 
>> with SYSDPY and didn't notice anything maxing out.  I restarted the 
>> GETNOD batch job on VENTI2::.  Maybe in another 15 weeks, it will 
>> break again.
>>
>> /Annoyed/…
>>
>>> ------------------------------------------------------------------------
>>> On 5/4/21 10:31 PM, Thomas DeBellis wrote:
>>>
>>> Personally, I don't see how it could /possibly/ be anything to do 
>>> with the REACH:: node definition, but I have been known to 
>>> occasionally overlook the utterly obvious, particularly when it's 
>>> near night-night.  Maybe not this time.
>>>
>>> Right now, the way to figure it out is to get the minor error data 
>>> and see where that takes things.  So I'm making a change to JNTMAN to 
>>> have .NDINT to return the lower level code on an incomplete insert. 
>>> SCLINK appears to have a problem that it is mangling return values, 
>>> which I'm currently investigating.
>>>
>>> You can't just blithely assuming somebody got it wrong and 'fix' 
>>> things; sometimes it's a certain way for a reason.
>>>
>>> On 5/4/21 8:46 PM, Johnny Billquist wrote:
>>>> On 2021-05-05 00:54, Mike Kostersitz wrote:
>>>>> Ouch that is one of my nodes 😊 @Johnny Billquist 
>>>>> <mailto:bqt at softjar.se> anything you could think of since we just 
>>>>> renamed my old RSX11M node to REACH.
>>>>
>>>> Well. It is something slightly broken in Tops-20, so there isn't 
>>>> really anything we can do about it.
>>>>
>>>> Except hope that Thomas can figure it out and fix it.
>>>>
>>>>  Johnny
>>>>
>>>>>
>>>>> Mike
>>>>>
>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for 
>>>>> Windows 10
>>>>>
>>>>> *From: *Thomas DeBellis <mailto:tommytimesharing at gmail.com>
>>>>> *Sent: *Tuesday, May 4, 2021 15:16
>>>>> *To: *HECnet <mailto:hecnet at update.uu.se>
>>>>> *Subject: *[HECnet] Re: Tops-20 SETNOD Failure
>>>>>
>>>>> I fixed a few things in SETNOD to get some more information about 
>>>>> the error.  In particular,
>>>>>
>>>>>   * Allow listing of AREA 1 (this was specifically disallowed, I don't
>>>>>     know why)
>>>>>   * More consistent error reporting (via ESOUT%)
>>>>>   * List more than one node when doing an area list (it would only 
>>>>> list
>>>>>     a single node)
>>>>>   * List nodes with more than three digits in the node number when 
>>>>> doing
>>>>>     columnar output
>>>>>
>>>>> So now you get the expected results:
>>>>>
>>>>>     SETNOD>lis a 1
>>>>>     [Area 1]
>>>>>     A1RTR   1023    ATHENA   620    ATLE     605    AURORA 606   
>>>>>     BANAI    770
>>>>>     BANX25   771    BEA       19    BIZET    800 BJARNE     7       
>>>>> BLINKY   266
>>>>>     CATWZL   302    CLYDE    269    COOPER   263    CRISPS 201   
>>>>>     CYGNUS   259
>>>>>     DAVROS   254    DBIT     351    DE1RSX   450    DE1RSY 452   
>>>>>     DOCTOR   252
>>>>>     ELIN     616    ELMER    617    ERNIE      2    ERSATZ 350   
>>>>>     FLETCH   100
>>>>>     FNATTE     3    FREJ     608    GAXP     730 GNAT      16       
>>>>> GNOME      6
>>>>>     GOBLIN     4    GVAX     731    HAGMAN   262    HARPER 261   
>>>>>     HORSE    150
>>>>>     HUGIN    602    HYUNA    500    INKY     268    JIMIN 501       
>>>>> JOCKE     21
>>>>>     JOSSE     17    KLIO     451    KRILLE     8    LOKE 607       
>>>>> MACARO   303
>>>>>     MACRA    258    MAGICA     1    MASTER   251 MIM       13       
>>>>> MUNIN    603
>>>>>     NIPPER   202    NOMAD    610    NOXBIT   720    ORACLE 301   
>>>>>     PACMAN   265
>>>>>     PAI      541    PALLAS   621    PAMINA    18    PIDP11 560   
>>>>>     PINKY    267
>>>>>     PISTON   520    PLINTH   200    PMAVS2   510 PONDUS    15       
>>>>> PONY      12
>>>>>     PUFF      22    QEMUNT   151    REI      540 ROCKY     11       
>>>>> ROJIN    542
>>>>>     RSX124   306    RSX145   304    RSX170   305    RSX184 307   
>>>>>     RUTAN    255
>>>>>     SHARPE   260    SIDRAT   253    SIGGE     10 SPEEDY    24       
>>>>> TARDIS   250
>>>>>     TEMPO      9    THOROS   257    TINA      14    TIPSY 604       
>>>>> TONGUE   264
>>>>>     TOPSY    601    VALAR    400    VAROS    256 WXP       20       
>>>>> WXP2      23
>>>>>     YMER     609    ZEKE       5
>>>>>     Total nodes in area 1: 92
>>>>>     SETNOD>exit
>>>>>
>>>>> Regarding the error, I have reproduced it with a single entry, viz:
>>>>>
>>>>>     !setnod
>>>>>     SETNOD>_set nod 2.298 name REACH_
>>>>>     SETNOD>_insert_
>>>>>     ?SETNOD: Failed at node REACH (2.298), Item 0 of 1
>>>>>     SETNOD>
>>>>>
>>>>> The high level code to do the entry is in JNTMAN.  It loops through 
>>>>> the table passed to it via .NDINT, calling a lower level routine 
>>>>> called SCTAND in SCLINK.  An error here is passed up to JNTMAN, but 
>>>>> it is not passed back to the user. There are some other problems in 
>>>>> SCLINK pertaining to negative return values, so some minor work is 
>>>>> necessary there, also.
>>>>>
>>>>> I'll make some changes to these two modules, generate a new monitor 
>>>>> for VENTI2 and see what happens in a few days.
>>>>>
>>>>> Right now, if any Tops-20 using is using SETNOD to update DECnet 
>>>>> tables, this appears to fail.  If anybody else is seeing it or can 
>>>>> reproduce it, I'd like to hear about it.
>>>>>
>>>>>     On 5/4/21 11:15 AM, Thomas DeBellis wrote:
>>>>>
>>>>>     Has anybody ever seen SETNOD fail to insert the entire node 
>>>>> list?  I
>>>>>     just did.
>>>>>
>>>>>     Shortly after I put my 20's up on HECnet, I wrote a reoccurring
>>>>>     batch job that fires once a week on Sundays to pull the latest 
>>>>> node
>>>>>     list (T20.FIX) from MIM::.  I use the highly venerable FILCOM
>>>>>     program to do a difference of it with the previous week's list.  I
>>>>>     don't do anything in particular with the output except save it in
>>>>>     case I feel like looking at it for some reason.
>>>>>
>>>>>     The batch job always inserts the entire list, rewriting whatever
>>>>>     might be in the monitor's data base.  I have always been 
>>>>> unsatisfied
>>>>>     with doing things that way because it seemed to me to be 
>>>>> inefficient
>>>>>     as the node list grew.   The HECnet node list count was 716 on
>>>>>     9-Jun-19 and it's now up to 884 as of the latest version that I've
>>>>>     pulled, 30-Apr-21.  The other problem is the microscopic 
>>>>> possibility
>>>>>     that a node is in Tops-20's monitor database (a hash table) that
>>>>>     isn't in the HECnet node list.
>>>>>
>>>>>     Nodes can get removed, although I think that infrequent.  Nodes
>>>>>     could get inserted outside of the batch job, but I think that most
>>>>>     unlikely in my situation.  Nodes can get renamed, as evidenced by
>>>>>     2.299 below, which went from THEPIT to THEARK.  None of this 
>>>>> should
>>>>>     or has broken anything.
>>>>>
>>>>>     However, it's been in the back of my mind to do two enhancements,
>>>>>     one to Tops-20 and one to SETNOD.  The NODE% JSYS should have an
>>>>>     additional feature to return the current monitor data base.  The
>>>>>     SETNOD program should be enhanced to take that to compute the set
>>>>>     difference with the new list.  This would show additions, renames
>>>>>     and deletions.  That would bring the update operation down from 
>>>>> some
>>>>>     hundred items to less than ten, on average.  This would obviously
>>>>>     make more of a difference on huge DECnet's in the tens of 
>>>>> thousands
>>>>>     of nodes.  Another NODE% feature should probably be to whack the
>>>>>     entire monitor database except for the local node, which would be
>>>>>     useful for trouble shooting.
>>>>>
>>>>>     Last Sunday, the batch job failed with the following error:
>>>>>
>>>>>     18:33:40 USER   SETNOD>*TAKE SYSTEM:NODE-DATA.TXT.0
>>>>>     18:33:40 USER
>>>>>     18:33:40 USER   [Fork SETNOD opening <SYSTEM>NODE-DATA.TXT.1 for
>>>>>     reading]
>>>>>     18:33:41 USER   SETNOD>*SAVE
>>>>>     18:33:41 USER
>>>>>     18:33:41 USER   [Fork SETNOD opening <SYSTEM>NODE-DATA.BIN.74 for
>>>>>     reading, writing]
>>>>>     18:33:41 USER   SETNOD>*INSERT
>>>>>     18:33:41 USER
>>>>>     18:33:41 USER *?SETNOD: Failed at node REACH*
>>>>>     18:33:41 USER   SETNOD>
>>>>>
>>>>>     I had a look at the SETNOD source and the HECnet node list and 
>>>>> have
>>>>>     discovered and concluded a few things.  First, there doesn't 
>>>>> seem to
>>>>>     be anything syntactically wrong with REACH::'s definition: "set 
>>>>> nod
>>>>>     2.298 name REACH".  Second, there don't appear to be any semantic
>>>>>     issues.  2.298 wasn't in use and it shouldn't matter if it was.
>>>>>
>>>>>     In the case of INSERT, there are two kinds of errors from NODE%, a
>>>>>     general failure of the JSYS and an incomplete insertion.   The 
>>>>> error
>>>>>     is from the second case.  Unfortunately, SETNOD isn't reporting
>>>>>     enough information about the error, so I have to make some changes
>>>>>     there.  It's also possible that SETNOD is building an inconsistent
>>>>>     database for the monitor to swallow; at least the LIST command is
>>>>>     giving me some odd results, viz:
>>>>>
>>>>>         SETNOD>list arEA 2
>>>>>
>>>>>         [AREA 2]
>>>>>         A2RTR
>>>>>
>>>>>         TOTAL NODES FOUND: 1
>>>>>
>>>>>         SETNOD>
>>>>>
>>>>>     That's clearly wrong, viz:
>>>>>
>>>>>         !i dec
>>>>>           Local DECNET node: VENTI2.  Nodes reachable: 7.
>>>>>           Accessible DECNET nodes are:    A2RTR    BOINGO LEGATO   
>>>>>         TOMMYT    VENTI2    VENTI    ZITI
>>>>>
>>>>>     The Exec output should probably be changed to say, "Nodes 
>>>>> reachable
>>>>>     in local area" and "Online nodes in area are:"
>>>>>
>>>>>     Anybody have any ideas?  Hunches?  Clues?
>>>>>
>>>>> File 1) OLDF:[4,120]    created: 1241 15-Apr-21
>>>>> File 2) NEWF:[1,1]      created: 0102 30-Apr-21
>>>>>
>>>>> 1)1     set nod 44.9 name OSMIUM
>>>>> ****
>>>>> 2)1     set nod 2.292 name OSIRIS
>>>>> 2)      set nod 44.9 name OSMIUM
>>>>> **************
>>>>> 1)1     set nod 13.3 name RED
>>>>> ****
>>>>> 2)1 *set nod 2.298 name REACH *
>>>>> 2)      set nod 13.3 name RED
>>>>> **************
>>>>> 1)1     set nod 2.298 name RSX11M
>>>>> 1)      set nod 1.306 name RSX124
>>>>> ****
>>>>> 2)1     set nod 1.306 name RSX124
>>>>> **************
>>>>> 1)1     set nod 42.5 name SPARKY
>>>>> ****
>>>>> 2)1     set nod 2.291 name SPARK
>>>>> 2)      set nod 42.5 name SPARKY
>>>>> **************
>>>>> 1)1     set nod 2.299 name THEPIT
>>>>> 1)      set nod 35.70 name THOMAS
>>>>> ****
>>>>> 2)1     set nod 2.299 name THEARK
>>>>> 2)      set nod 35.70 name THOMAS
>>>>> **************
>>>>>
>>>>

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


More information about the Hecnet-list mailing list