<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>You have my sympathies; I saw the storm coming and it finally got

      me off my butt to plug in a UPS that I had been dawdling on for

      about eight months.  That saved about half of everything,

      including the 20's.  I got another UPS two days afterwards for the

      other half.</p>

    <blockquote>

      <p>Conditioned power is <i>Good</i>...<br>

      </p>

    </blockquote>

    <div class="moz-cite-prefix">While you're having all that fun, don't

      forget to update your DECnet hosts (<font size="+1"><tt>SETNODE/SYSTEM:NODE-DATA.TXT</tt></font>);

      the last time I checked, you were very out of date--you don't have

      definitions for my systems and I've been on HECnet since June of

      last year.<br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">I'm putting in some code into DTESRV to

      wait longer than a second to declare a front-end down.  It's a

      little tricky because some of it is running in scheduler context,

      outside of fork context, which means that I can't use certain wait

      paradigms.  Hopefully then, I'll be able to use some of that data

      to see what is keeping KLH10 from updating the master DTE

      keep-alive counter (<font size="+1"><tt>KPALIV</tt></font>).<br>

    </div>

    <blockquote type="cite"

      cite="mid:746354b4-88a5-7987-d15a-403233d99031@riseup.net">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <hr width="100%" size="2">

      <p><font face="Arial">On 9/4/20 12:08 AM, Supratim Sanyal wrote: </font></p>

      <p><font face="Arial">ok, will do. first I have to find and

          restore the KLH10 instance from backup, thanks to an

          unexpectedly violent storm that triggered tornado warnings and

          consecutive brown-outs. thanks.</font><br>

      </p>

      <blockquote type="cite"

        cite="mid:b4469f48-0532-8651-6893-9c48c8aa9798@gmail.com">

        <meta http-equiv="Content-Type" content="text/html;

          charset=UTF-8">

        <hr width="100%" size="2">On 9/3/20 7:39 PM, Thomas DeBellis

        wrote:<br>

        <br>

        Shortly after sending this, I wedged my development machine by

        mistakenly beating on the file system; this time by running

        SPEAR to pull out events around the DTEKPA BUGCHK.  There was

        too much activity (I have a very large ERROR.SYS, thanks to

        DECnet) and I got a DTEKPA.  Once this happens, the machine

        hangs shortly afterwards.  This finally caused me to have a look

        at DTESRV.<br>

        <p>KPALIV is a variable that is incremented by Tops-20 in a

          number of circumstances by SCHED, APRSRV and (oddly) CFSSRV. 

          It's a keep alive counter that both the front end and Tops-20

          pay attention to.  An examination of the live monitor shows

          that it is monotonically increasing:</p>

        <blockquote>

          <p><font size="+1"><tt>1,,COMBAS+5[   417,,424521</tt><tt><br>

              </tt><tt>1,,COMBAS+5[   417,,426524</tt><tt><br>

              </tt><tt>1,,COMBAS+5[   417,,510532</tt></font></p>

        </blockquote>

        <p>It is updated approximately every 500 milliseconds; let's

          call that a keep-alive tick.  If it isn't updated in two

          ticks, the front end is declared down and reload action is

          initiated.  A number of things are done and it appears that

          KLH10 is not properly handling them.  Since the KLH10 DTE

          service is not running in a separate process (there are

          vestigial hooks to do this), it does not handle a ten

          triggered reload.</p>

        <p>Tops-20 waits for the reload to complete, KLH10 does nothing

          and you're hung.</p>

        <p>Fortunately, there is some code for the master DTE which

          checks a variable called FEDBSW, Front End Debugging Switch. 

          If this is non-zero, then the keep-alive count is incremented,

          but it's never checked.  So I set it to -1 (it was zero) and

          then proceeded to beat on the file system with wild abandon.</p>

        <p>For periods of intense disk activity, the machine appeared to

          hang.  After about 10 to 20 seconds, it came right back as if

          nothing had never happened.  Interesting...<br>

        </p>

        <div class="moz-cite-prefix">Right now, my working assumption is

          that the PI system is getting saturated so that the clock

          interrupt somehow isn't making it through.  For now, I'm

          thinking of rewriting the service routine so that instead of

          checking for two ticks, it checks elapsed time which can then

          be set to some 'reasonable' value.</div>

        <div class="moz-cite-prefix">

          <p>If you think this may be what is hanging you, then you can

            try it.  For me, FEDBSW is at octal 1,,304544.  Thus far,

            I'm up 42:44:57 (1 Day, 18 Hours, 44 Minutes, 57 Seconds and

            615 Milliseconds).<br>

          </p>

        </div>

        <blockquote type="cite"

          cite="mid:32cda764-ddaa-fa40-5f94-01bea0450862@gmail.com">

          <meta http-equiv="Content-Type" content="text/html;

            charset=UTF-8">

          <p> </p>

          <hr width="100%" size="2">

          <p>On 8/31/20 9:03 PM, Thomas DeBellis wrote:</p>

          <p>Do you know what program is displaying those three lines?</p>

          <p>I'm unaware of a PANDA distribution that didn't announce

            itself as a PANDA distribution in the system banner.   The

            date and time display is odd.  Tops-20 native time output

            has been Y2K compliant since forever.  It's the Tops-10

            programs (MACRO, CREF, Etc.), plus Tops-10'ish programs

            (GLXLIB, Quasar, Etc.) that needed Y2K patches.</p>

          <p>Tops-20 DAP needed a small modification to handle Y2K and

            to not break RSX.</p>

          <p>The Tops-10 system that I use has a number of non-Y2K

            times, which surprised me.  While I have had the freedom to

            remediate, I simply don't have the time.  But it's jarring.<br>

          </p>

          <div class="moz-cite-prefix">I also found it interesting that

            the banner says DEC10 Development; 20's were sometimes

            called DEC20's, but never DEC10's (well, 1031 might have

            been an exception).</div>

          <div class="moz-cite-prefix">

            <p>I could have sworn you were showing us something off of a

              Tops-10 CTY...<br>

            </p>

          </div>

          <blockquote type="cite"

            cite="mid:818D36A6-70A4-419F-89DE-2CFF63BEC76A@riseup.net">

            <meta http-equiv="content-type" content="text/html;

              charset=UTF-8">

            <hr width="100%" size="2">On 8/31/20 7:13 PM, Supratim

            Sanyal wrote:

            <div><br>

              <div dir="ltr">I will keep digging - but it is possibly

                interesting this happens between approx 52 and and

                indeterminate number of solid uptime<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">

                  <meta http-equiv="Content-Type" content="text/html;

                    charset=UTF-8">

                  <hr width="100%" size="2">

                  <p>On Aug 31, 2020, at 5:00 PM, Thomas DeBellis <<a

                      href="mailto:tommytimesharing@gmail.com"

                      moz-do-not-send="true">tommytimesharing@gmail.com</a>>

                    wrote:<br>

                    <br>

                    If you are running a standard PANDA distribution,

                    then DDT is in the monitor and you may fail to it. 

                    Did it come up?  Did you do an examine from the

                    KLH10 micro-engine to see what instruction it was

                    failing on?  Did you see what module it is failing

                    in?</p>

                  <p>My monitor is modified from the base PANDA

                    distribution to include several local enhancements,

                    so when I looked at that address, it showed up as in

                    the entry of CHKOPC, which is what is checking for

                    differed closes on virtual circuits.  This is in

                    PHYKLP which is the KLIPA driver (a.k.a. the CI). 

                    Since KLH10 (sadly) does not implement the CI, there

                    is no way you should be executing in that module as

                    there nothing for it to talk to.</p>

                  <p>Moreover, there is no JRST 4 there.  So probably

                    you have something else at that address.<br>

                  </p>

                  <p>I have been running KLH10 for a <i>very</i> long

                    time; since late December 2002 and have made

                    modifications there, too to fix an issue with

                    locking memory and to better support Linux (recent

                    Ubuntu).  It is remarkably robust; despite intensive

                    development, I have stayed up well over a year at a

                    time (I.E., hit UP2LNG BUGHLT's)<br>

                  </p>

                  <div class="moz-cite-prefix">I have found one problem;

                    if you are running it on an <u>extremely</u> fast

                    machine with SSD storage (in other words, you're

                    basically never waiting for anything) and you

                    seriously beat on the file system, then the

                    keep-alive counter can get out of sync with the 20

                    thinking the front end has died and the KLH10 DTE

                    simulator apparently not understanding what to do.</div>

                  <div class="moz-cite-prefix">

                    <p>The 20 typed an initial BUGCHK and then in the

                      middle of the second one, it hangs waiting for the

                      front end.</p>

                    <p>It's on my list of things to investigate.<br>

                    </p>

                  </div>

                  <blockquote type="cite"

                    cite="mid:3f6d7313-d8cb-11d1-cd1c-ac04924d9893@riseup.net">

                    <hr width="100%" size="2">On 8/31/20 4:15 PM,

                    Supratim Sanyal wrote:<br>

                    <br>

                    hi all - my panda distribution instance is halting

                    after a couple of days with the following message.

                    is this a known problem for which there is some

                    workaround? <br>

                    <br>

                    Monitor RF434E DEC10 Development <br>

                    System uptime 52:10:47 <br>

                    Current date/time Wednesday 29-Jul-120 6:01:04 <br>

                    <br>

                    [HALTED: Program Halt, PC = 22013] <br>

                    <br>

                    thanks <br>

                    <br>

                    Supratim <br>

                    <br>

                  </blockquote>

                </div>

              </blockquote>

            </div>

          </blockquote>

        </blockquote>

      </blockquote>

      <pre class="moz-signature" cols="72">-- 

Supratim Sanyal, W1XMT

39.19151 N, 77.23432 W

QCOCAL::SANYAL via HECnet</pre>

    </blockquote>

  </body>

</html>