<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><font face="Arial">ok, will do. first I have to find and restore
the KLH10 instance from backup, thanks to an unexpectedly
violent storm that triggered tornado warnings and consecutive
brown-outs. thanks.</font><br>
</p>
<div class="moz-cite-prefix">On 9/3/20 7:39 PM, Thomas DeBellis
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:b4469f48-0532-8651-6893-9c48c8aa9798@gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>Shortly after sending this, I wedged my development machine by
mistakenly beating on the file system; this time by running
SPEAR to pull out events around the DTEKPA BUGCHK. There was
too much activity (I have a very large ERROR.SYS, thanks to
DECnet) and I got a DTEKPA. Once this happens, the machine
hangs shortly afterwards. This finally caused me to have a look
at DTESRV.</p>
<p>KPALIV is a variable that is incremented by Tops-20 in a number
of circumstances by SCHED, APRSRV and (oddly) CFSSRV. It's a
keep alive counter that both the front end and Tops-20 pay
attention to. An examination of the live monitor shows that it
is monotonically increasing:</p>
<blockquote>
<p><font size="+1"><tt>1,,COMBAS+5[ 417,,424521</tt><tt><br>
</tt><tt>1,,COMBAS+5[ 417,,426524</tt><tt><br>
</tt><tt>1,,COMBAS+5[ 417,,510532</tt></font></p>
</blockquote>
<p>It is updated approximately every 500 milliseconds; let's call
that a keep-alive tick. If it isn't updated in two ticks, the
front end is declared down and reload action is initiated. A
number of things are done and it appears that KLH10 is not
properly handling them. Since the KLH10 DTE service is not
running in a separate process (there are vestigial hooks to do
this), it does not handle a ten triggered reload.</p>
<p>Tops-20 waits for the reload to complete, KLH10 does nothing
and you're hung.</p>
<p>Fortunately, there is some code for the master DTE which checks
a variable called FEDBSW, Front End Debugging Switch. If this
is non-zero, then the keep-alive count is incremented, but it's
never checked. So I set it to -1 (it was zero) and then
proceeded to beat on the file system with wild abandon.</p>
<p>For periods of intense disk activity, the machine appeared to
hang. After about 10 to 20 seconds, it came right back as if
nothing had never happened. Interesting...<br>
</p>
<div class="moz-cite-prefix">Right now, my working assumption is
that the PI system is getting saturated so that the clock
interrupt somehow isn't making it through. For now, I'm
thinking of rewriting the service routine so that instead of
checking for two ticks, it checks elapsed time which can then be
set to some 'reasonable' value.</div>
<div class="moz-cite-prefix">
<p>If you think this may be what is hanging you, then you can
try it. For me, FEDBSW is at octal 1,,304544. Thus far, I'm
up 42:44:57 (1 Day, 18 Hours, 44 Minutes, 57 Seconds and 615
Milliseconds).<br>
</p>
</div>
<blockquote type="cite"
cite="mid:32cda764-ddaa-fa40-5f94-01bea0450862@gmail.com">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<p> </p>
<hr width="100%" size="2">
<p>On 8/31/20 9:03 PM, Thomas DeBellis wrote:</p>
<p>Do you know what program is displaying those three lines?</p>
<p>I'm unaware of a PANDA distribution that didn't announce
itself as a PANDA distribution in the system banner. The
date and time display is odd. Tops-20 native time output has
been Y2K compliant since forever. It's the Tops-10 programs
(MACRO, CREF, Etc.), plus Tops-10'ish programs (GLXLIB,
Quasar, Etc.) that needed Y2K patches.</p>
<p>Tops-20 DAP needed a small modification to handle Y2K and to
not break RSX.</p>
<p>The Tops-10 system that I use has a number of non-Y2K times,
which surprised me. While I have had the freedom to
remediate, I simply don't have the time. But it's jarring.<br>
</p>
<div class="moz-cite-prefix">I also found it interesting that
the banner says DEC10 Development; 20's were sometimes called
DEC20's, but never DEC10's (well, 1031 might have been an
exception).</div>
<div class="moz-cite-prefix">
<p>I could have sworn you were showing us something off of a
Tops-10 CTY...<br>
</p>
</div>
<blockquote type="cite"
cite="mid:818D36A6-70A4-419F-89DE-2CFF63BEC76A@riseup.net">
<meta http-equiv="content-type" content="text/html;
charset=UTF-8">
<hr width="100%" size="2">On 8/31/20 7:13 PM, Supratim Sanyal
wrote:
<div><br>
<div dir="ltr">I will keep digging - but it is possibly
interesting this happens between approx 52 and and
indeterminate number of solid uptime<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<hr width="100%" size="2">
<p>On Aug 31, 2020, at 5:00 PM, Thomas DeBellis <<a
href="mailto:tommytimesharing@gmail.com"
moz-do-not-send="true">tommytimesharing@gmail.com</a>>
wrote:<br>
<br>
If you are running a standard PANDA distribution, then
DDT is in the monitor and you may fail to it. Did it
come up? Did you do an examine from the KLH10
micro-engine to see what instruction it was failing
on? Did you see what module it is failing in?</p>
<p>My monitor is modified from the base PANDA
distribution to include several local enhancements, so
when I looked at that address, it showed up as in the
entry of CHKOPC, which is what is checking for
differed closes on virtual circuits. This is in
PHYKLP which is the KLIPA driver (a.k.a. the CI).
Since KLH10 (sadly) does not implement the CI, there
is no way you should be executing in that module as
there nothing for it to talk to.</p>
<p>Moreover, there is no JRST 4 there. So probably you
have something else at that address.<br>
</p>
<p>I have been running KLH10 for a <i>very</i> long
time; since late December 2002 and have made
modifications there, too to fix an issue with locking
memory and to better support Linux (recent Ubuntu).
It is remarkably robust; despite intensive
development, I have stayed up well over a year at a
time (I.E., hit UP2LNG BUGHLT's)<br>
</p>
<div class="moz-cite-prefix">I have found one problem;
if you are running it on an <u>extremely</u> fast
machine with SSD storage (in other words, you're
basically never waiting for anything) and you
seriously beat on the file system, then the keep-alive
counter can get out of sync with the 20 thinking the
front end has died and the KLH10 DTE simulator
apparently not understanding what to do.</div>
<div class="moz-cite-prefix">
<p>The 20 typed an initial BUGCHK and then in the
middle of the second one, it hangs waiting for the
front end.</p>
<p>It's on my list of things to investigate.<br>
</p>
</div>
<blockquote type="cite"
cite="mid:3f6d7313-d8cb-11d1-cd1c-ac04924d9893@riseup.net">
<hr width="100%" size="2">On 8/31/20 4:15 PM, Supratim
Sanyal wrote:<br>
<br>
hi all - my panda distribution instance is halting
after a couple of days with the following message. is
this a known problem for which there is some
workaround? <br>
<br>
Monitor RF434E DEC10 Development <br>
System uptime 52:10:47 <br>
Current date/time Wednesday 29-Jul-120 6:01:04 <br>
<br>
[HALTED: Program Halt, PC = 22013] <br>
<br>
thanks <br>
<br>
Supratim <br>
<br>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-signature" cols="72">--
Supratim Sanyal, W1XMT
39.19151 N, 77.23432 W
QCOCAL::SANYAL via HECnet</pre>
</body>
</html>