<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>After working on the DTEKPA issue and coming up with a work

      around, I went searching to see if the problem had ever been

      reported.  In fact, it had been.   By me.   ...Well over a decade

      ago...</p>

    <p>Basically, the default behavior for Tops-20 is to note that a

      front-end counter isn't incrementing and--if on the next check it

      still hasn't changed--to declare the associated PDP-11 front end

      down and to initiate a reboot.  This gives the 11 about a

      millisecond to get its act together before the KL whacks it.</p>

    <p>Of course, that will never work in KLH10 (see below and

      previous); you're hung because the KLH10 DTE emulator doesn't

      implement code to simulate a reboot action, so the KL loops

      forever looking for the response.  Of course, why would it?  There

      should be no reason for Tops-20 to ever think its down.   Seeing

      as I like writing assembler more than C, I decided not to tweak

      the KLH10 code (I have tweaked it for other cases and might

      rethink this).</p>

    <p>The workaround is to define some additional functions for the <font

        size="+1"><tt>BOOT% JSYS</tt></font> to set some variables in

      resident storage in <font size="+1"><tt>STG</tt></font> and

      modify <font size="+1"><tt>DTESRV</tt></font> to use them.  You

      can now set an elapsed time to wait before declaring the PDP-11

      down.  I default to five minutes of non-incrementing keep-alive. 

      Depending on how hard I am beating on things, the front end

      'appears' to go away between anywhere from 5 to 15 seconds.</p>

    <p>I think probably the real fix is to not depend on an OS interrupt

      to increment the counter.  A thread should be spawned which uses <font

        size="+1"><tt>nanosleep</tt><tt>()</tt></font> to bump the

      counter every 500 microseconds, no matter what the rest of KLH10

      might be doing.  KLH10 is already using multiple forks for the

      disks, tape, NI, Etc. (one reason I've preferred it over SimH), so

      maybe this won't be a big deal.<br>

    </p>

    <blockquote type="cite">

      <hr width="100%" size="2"><b>From</b>: Mark Crispin

      <a class="moz-txt-link-rfc2396E" href="mailto:MRC@Lingling.Panda.COM"><MRC@Lingling.Panda.COM></a><br>

      <div class="moz-cite-prefix"><b>Subject</b>: Re: KLH10 front-end

        reload??<br>

        <b>To</b>: Thomas DeBellis <a class="moz-txt-link-rfc2396E" href="mailto:slogin@acedsl.com"><slogin@acedsl.com></a><br>

        <div class="moz-cite-prefix"><b>Date</b>: Sun, 29 Nov 2009

          11:20:46 -0800 (PST)

          <div class="moz-cite-prefix"><b>In-Reply-To</b>:

            <a class="moz-txt-link-rfc2396E" href="mailto:4B11CFEA.4020906@acedsl.com"><4B11CFEA.4020906@acedsl.com></a><br>

            <b>Message-ID</b>:

            <a class="moz-txt-link-rfc2396E" href="mailto:alpine.OSX.2.00.0911291100070.245@hsinghsing.panda.com"><alpine.OSX.2.00.0911291100070.245@hsinghsing.panda.com></a><br>

          </div>

          <div class="moz-cite-prefix"><br>

          </div>

          KLH10 implements enough for the front end DTE protocol for

          TOPS-20 to think that it is talking to a front end, albeit one

          with just a CTY (no KLINIK, DL11 lines, or DECnet).<br>

        </div>

      </div>

      <p>There is a keepalive timer in both TOPS-20 (to reboot the front

        end when the front end crashes) and in RSX-11F (to reboot

        TOPS-20 when it crashes).<br>

        <br>

        The front end also keeps time well enough to set TOPS-20's clock

        following a crash-reboot; this is superceded in KLH10 as the

        timebase instructions get the time from the host OS.  In

        addition, Panda monitors try to run a program called TIMCHK

        which will synchronize with NTP servers.<br>

        <br>

        So, what happened was that the DTE protocol stopped for some

        reason. TOPS-20 tried to reboot the front end in an attempt to

        get it going, but of course that was futile.<br>

        <br>

        Here's something that may help:<br>

        <br>

        On some Linux systems the esoteric real-time interrupt

        mechanisms in KLH10 don't work well.  So, it may be necessary to

        set KLH10_ITIME_SYNC instead of the default KLH10_ITIME_INTRP. 

        Note that doing so will make KLH10 burn much more CPU on the

        host system.<br>

        <br>

        Usually, though, if you need to do this, it becomes pretty

        obvious at once, with nasty DTE errors from KLH10 shortly after

        booting (and any time you type on the CTY).<br>

        <br>

        One reason why I haven't upgraded Lingling's host CPU is that

        most of the newer machines that I've run KLH10 on have required

        doing this.  It's quite annoying. <br>

      </p>

      <blockquote type="cite" cite="mid:4B11CFEA.4020906@acedsl.com">

        <hr width="100%" size="2"><b>From</b>: Thomas DeBellis

        <a class="moz-txt-link-rfc2396E" href="mailto:slogin@acedsl.com"><slogin@acedsl.com></a><br>

        <b>Subject</b>: KLH10 front-end reload??<br>

        <b>To</b>: Tops-20 Wizards <a class="moz-txt-link-rfc2396E" href="mailto:TOPS-20@lingling.panda.com"><TOPS-20@lingling.panda.com></a><br>

        <b>Date</b>: Sat, 28 Nov 2009 20:35:38 -0500<br>

        <b>Message-ID</b>: <a class="moz-txt-link-rfc2396E" href="mailto:4B11CFEA.4020906@acedsl.com"><4B11CFEA.4020906@acedsl.com></a><br>

        <br>

        Tommy Timesharing hung earlier today; it had been up over a 175

        days.

        I got an error around 1:27PM-EST that the front end had hung and

        was

        rebooted.  By the time I noticed at 4:08, the system was

        completely

        wedged. <br>

        <br>

        I couldn't get in on the CTY, but KLH10 appeared to be working. 

        However, in the process of poking around, I completely destroyed

        some

        information, so I am unable to determine exactly what was going

        on.  Sigh... <br>

        <br>

        As I had been up since early June (and the middle of March

        before that

        because of a power failure), this does not appear to be of

        immediate concern.  The system had not had a single issue during

        all this time

        (not even a BUGINF) <br>

        <br>

        However ...  Ideas, anyone?  Should I think about getting

        nervous?  I

        mean, there IS no front-end on KLH10, right? <br>

________________________________________________________________________

        <br>

        <font size="+1"><tt><br>

          </tt><tt>

************************************************************************

          </tt><tt><br>

          </tt><tt>

            TOPS-20 BUGHLT-BUGCHK </tt><tt><br>

          </tt><tt>

             Logged on Sat 28 Nov 2009 13:27:08      Monitor uptime was

            175 days 19:54:26 </tt><tt><br>

          </tt><tt>

                Detected on system # 3699. </tt><tt><br>

          </tt><tt>

                Record sequence number:    17527. </tt><tt><br>

          </tt><tt>

************************************************************************

          </tt><tt><br>

          </tt><tt> </tt><tt><br>

          </tt><tt>

            Error information: </tt><tt><br>

          </tt><tt>

                Date/Time of error:    Sat 28 Nov 2009 13:27:05 </tt><tt><br>

          </tt><tt>

                Errors since reload:    1. </tt><tt><br>

          </tt><tt>

                Fork # & Job #:        777777,777777 </tt><tt><br>

          </tt><tt>

                User's logged in dir:    unknown </tt><tt><br>

          </tt><tt>

                Program name:        </tt><tt><br>

          </tt><tt>

                Error:            BUGINF </tt><tt><br>

          </tt><tt>

                Address of error:    1137031 </tt><tt><br>

          </tt><tt>

                Name:            DTEKPA </tt><tt><br>

          </tt><tt>

                Description:        DTE keep alive fail </tt><tt><br>

          </tt><tt>

                CONI APR:        007740,,000003 = No error bits detected

          </tt><tt><br>

          </tt><tt>

                CONI PAG:        000000,,660151 </tt><tt><br>

          </tt><tt>

                DATAI PAG:        700101,,002750 </tt><tt><br>

          </tt><tt>

                Contents of ACs: </tt><tt><br>

          </tt><tt>

                         0:    000000,,575700 </tt><tt><br>

          </tt><tt>

                         1:    777777,,000000 </tt><tt><br>

          </tt><tt>

                         2:    000000,,000000 </tt><tt><br>

          </tt><tt>

                         3:    000000,,277242 </tt><tt><br>

          </tt><tt>

                         4:    000100,,206260 </tt><tt><br>

          </tt><tt>

                         5:    000000,,247445 </tt><tt><br>

          </tt><tt>

                         6:    000000,,000000 </tt><tt><br>

          </tt><tt>

                         7:    000000,,000000 </tt><tt><br>

          </tt><tt>

                        10:    777775,,000002 </tt><tt><br>

          </tt><tt>

                        11:    000000,,000000 </tt><tt><br>

          </tt><tt>

                        12:    000000,,614101 </tt><tt><br>

          </tt><tt>

                        13:    777772,,000012 </tt><tt><br>

          </tt><tt>

                        14:    777777,,777650 </tt><tt><br>

          </tt><tt>

                        15:    777305,,353304 </tt><tt><br>

          </tt><tt>

                        16:    620012,,000000 </tt><tt><br>

          </tt><tt>

                        17:    777115,,246540 </tt><tt><br>

          </tt><tt>

                PI status:        000000,,000175 </tt><tt><br>

          </tt><tt>

                Additional data items:    1 </tt><tt><br>

          </tt><tt>

                            000000,,000000 </tt><tt><br>

          </tt><tt> </tt><tt><br>

          </tt><tt>

                ERA:            000000,,000000 = word #0 Memory read </tt><tt><br>

          </tt><tt>

                Base phyiscal memory </tt><tt><br>

          </tt><tt>

                 address at failure:    0 </tt><tt><br>

          </tt><tt> </tt><tt><br>

          </tt><tt>

************************************************************************

          </tt><tt><br>

          </tt><tt>

            FRONT END RELOADED </tt><tt><br>

          </tt><tt>

             Logged on Sat 28 Nov 2009 13:28:04      Monitor uptime was

            175 days 19:55:21 </tt><tt><br>

          </tt><tt>

                Detected on system # 3699. </tt><tt><br>

          </tt><tt>

                Record sequence number:    17528. </tt><tt><br>

          </tt><tt>

************************************************************************

          </tt><tt><br>

          </tt><tt>

                CPU # :,,Front end #:    0,0 </tt><tt><br>

          </tt><tt>

                Status at reload:     No error bits detected </tt><tt><br>

          </tt><tt>

                Retries:    3 </tt><tt><br>

          </tt><tt>

                Filename for DUMP:   

            <SYSTEM>0DMP11.BIN.1,28-Nov-2009 13:27:05 </tt></font><br>

      </blockquote>

    </blockquote>

  </body>

</html>