<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>After working on the DTEKPA issue and coming up with a work
around, I went searching to see if the problem had ever been
reported. In fact, it had been. By me. ...Well over a decade
ago...</p>
<p>Basically, the default behavior for Tops-20 is to note that a
front-end counter isn't incrementing and--if on the next check it
still hasn't changed--to declare the associated PDP-11 front end
down and to initiate a reboot. This gives the 11 about a
millisecond to get its act together before the KL whacks it.</p>
<p>Of course, that will never work in KLH10 (see below and
previous); you're hung because the KLH10 DTE emulator doesn't
implement code to simulate a reboot action, so the KL loops
forever looking for the response. Of course, why would it? There
should be no reason for Tops-20 to ever think its down. Seeing
as I like writing assembler more than C, I decided not to tweak
the KLH10 code (I have tweaked it for other cases and might
rethink this).</p>
<p>The workaround is to define some additional functions for the <font
size="+1"><tt>BOOT% JSYS</tt></font> to set some variables in
resident storage in <font size="+1"><tt>STG</tt></font> and
modify <font size="+1"><tt>DTESRV</tt></font> to use them. You
can now set an elapsed time to wait before declaring the PDP-11
down. I default to five minutes of non-incrementing keep-alive.
Depending on how hard I am beating on things, the front end
'appears' to go away between anywhere from 5 to 15 seconds.</p>
<p>I think probably the real fix is to not depend on an OS interrupt
to increment the counter. A thread should be spawned which uses <font
size="+1"><tt>nanosleep</tt><tt>()</tt></font> to bump the
counter every 500 microseconds, no matter what the rest of KLH10
might be doing. KLH10 is already using multiple forks for the
disks, tape, NI, Etc. (one reason I've preferred it over SimH), so
maybe this won't be a big deal.<br>
</p>
<blockquote type="cite">
<hr width="100%" size="2"><b>From</b>: Mark Crispin
<a class="moz-txt-link-rfc2396E" href="mailto:MRC@Lingling.Panda.COM"><MRC@Lingling.Panda.COM></a><br>
<div class="moz-cite-prefix"><b>Subject</b>: Re: KLH10 front-end
reload??<br>
<b>To</b>: Thomas DeBellis <a class="moz-txt-link-rfc2396E" href="mailto:slogin@acedsl.com"><slogin@acedsl.com></a><br>
<div class="moz-cite-prefix"><b>Date</b>: Sun, 29 Nov 2009
11:20:46 -0800 (PST)
<div class="moz-cite-prefix"><b>In-Reply-To</b>:
<a class="moz-txt-link-rfc2396E" href="mailto:4B11CFEA.4020906@acedsl.com"><4B11CFEA.4020906@acedsl.com></a><br>
<b>Message-ID</b>:
<a class="moz-txt-link-rfc2396E" href="mailto:alpine.OSX.2.00.0911291100070.245@hsinghsing.panda.com"><alpine.OSX.2.00.0911291100070.245@hsinghsing.panda.com></a><br>
</div>
<div class="moz-cite-prefix"><br>
</div>
KLH10 implements enough for the front end DTE protocol for
TOPS-20 to think that it is talking to a front end, albeit one
with just a CTY (no KLINIK, DL11 lines, or DECnet).<br>
</div>
</div>
<p>There is a keepalive timer in both TOPS-20 (to reboot the front
end when the front end crashes) and in RSX-11F (to reboot
TOPS-20 when it crashes).<br>
<br>
The front end also keeps time well enough to set TOPS-20's clock
following a crash-reboot; this is superceded in KLH10 as the
timebase instructions get the time from the host OS. In
addition, Panda monitors try to run a program called TIMCHK
which will synchronize with NTP servers.<br>
<br>
So, what happened was that the DTE protocol stopped for some
reason. TOPS-20 tried to reboot the front end in an attempt to
get it going, but of course that was futile.<br>
<br>
Here's something that may help:<br>
<br>
On some Linux systems the esoteric real-time interrupt
mechanisms in KLH10 don't work well. So, it may be necessary to
set KLH10_ITIME_SYNC instead of the default KLH10_ITIME_INTRP.
Note that doing so will make KLH10 burn much more CPU on the
host system.<br>
<br>
Usually, though, if you need to do this, it becomes pretty
obvious at once, with nasty DTE errors from KLH10 shortly after
booting (and any time you type on the CTY).<br>
<br>
One reason why I haven't upgraded Lingling's host CPU is that
most of the newer machines that I've run KLH10 on have required
doing this. It's quite annoying. <br>
</p>
<blockquote type="cite" cite="mid:4B11CFEA.4020906@acedsl.com">
<hr width="100%" size="2"><b>From</b>: Thomas DeBellis
<a class="moz-txt-link-rfc2396E" href="mailto:slogin@acedsl.com"><slogin@acedsl.com></a><br>
<b>Subject</b>: KLH10 front-end reload??<br>
<b>To</b>: Tops-20 Wizards <a class="moz-txt-link-rfc2396E" href="mailto:TOPS-20@lingling.panda.com"><TOPS-20@lingling.panda.com></a><br>
<b>Date</b>: Sat, 28 Nov 2009 20:35:38 -0500<br>
<b>Message-ID</b>: <a class="moz-txt-link-rfc2396E" href="mailto:4B11CFEA.4020906@acedsl.com"><4B11CFEA.4020906@acedsl.com></a><br>
<br>
Tommy Timesharing hung earlier today; it had been up over a 175
days.
I got an error around 1:27PM-EST that the front end had hung and
was
rebooted. By the time I noticed at 4:08, the system was
completely
wedged. <br>
<br>
I couldn't get in on the CTY, but KLH10 appeared to be working.
However, in the process of poking around, I completely destroyed
some
information, so I am unable to determine exactly what was going
on. Sigh... <br>
<br>
As I had been up since early June (and the middle of March
before that
because of a power failure), this does not appear to be of
immediate concern. The system had not had a single issue during
all this time
(not even a BUGINF) <br>
<br>
However ... Ideas, anyone? Should I think about getting
nervous? I
mean, there IS no front-end on KLH10, right? <br>
________________________________________________________________________
<br>
<font size="+1"><tt><br>
</tt><tt>
************************************************************************
</tt><tt><br>
</tt><tt>
TOPS-20 BUGHLT-BUGCHK </tt><tt><br>
</tt><tt>
Logged on Sat 28 Nov 2009 13:27:08 Monitor uptime was
175 days 19:54:26 </tt><tt><br>
</tt><tt>
Detected on system # 3699. </tt><tt><br>
</tt><tt>
Record sequence number: 17527. </tt><tt><br>
</tt><tt>
************************************************************************
</tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt>
Error information: </tt><tt><br>
</tt><tt>
Date/Time of error: Sat 28 Nov 2009 13:27:05 </tt><tt><br>
</tt><tt>
Errors since reload: 1. </tt><tt><br>
</tt><tt>
Fork # & Job #: 777777,777777 </tt><tt><br>
</tt><tt>
User's logged in dir: unknown </tt><tt><br>
</tt><tt>
Program name: </tt><tt><br>
</tt><tt>
Error: BUGINF </tt><tt><br>
</tt><tt>
Address of error: 1137031 </tt><tt><br>
</tt><tt>
Name: DTEKPA </tt><tt><br>
</tt><tt>
Description: DTE keep alive fail </tt><tt><br>
</tt><tt>
CONI APR: 007740,,000003 = No error bits detected
</tt><tt><br>
</tt><tt>
CONI PAG: 000000,,660151 </tt><tt><br>
</tt><tt>
DATAI PAG: 700101,,002750 </tt><tt><br>
</tt><tt>
Contents of ACs: </tt><tt><br>
</tt><tt>
0: 000000,,575700 </tt><tt><br>
</tt><tt>
1: 777777,,000000 </tt><tt><br>
</tt><tt>
2: 000000,,000000 </tt><tt><br>
</tt><tt>
3: 000000,,277242 </tt><tt><br>
</tt><tt>
4: 000100,,206260 </tt><tt><br>
</tt><tt>
5: 000000,,247445 </tt><tt><br>
</tt><tt>
6: 000000,,000000 </tt><tt><br>
</tt><tt>
7: 000000,,000000 </tt><tt><br>
</tt><tt>
10: 777775,,000002 </tt><tt><br>
</tt><tt>
11: 000000,,000000 </tt><tt><br>
</tt><tt>
12: 000000,,614101 </tt><tt><br>
</tt><tt>
13: 777772,,000012 </tt><tt><br>
</tt><tt>
14: 777777,,777650 </tt><tt><br>
</tt><tt>
15: 777305,,353304 </tt><tt><br>
</tt><tt>
16: 620012,,000000 </tt><tt><br>
</tt><tt>
17: 777115,,246540 </tt><tt><br>
</tt><tt>
PI status: 000000,,000175 </tt><tt><br>
</tt><tt>
Additional data items: 1 </tt><tt><br>
</tt><tt>
000000,,000000 </tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt>
ERA: 000000,,000000 = word #0 Memory read </tt><tt><br>
</tt><tt>
Base phyiscal memory </tt><tt><br>
</tt><tt>
address at failure: 0 </tt><tt><br>
</tt><tt> </tt><tt><br>
</tt><tt>
************************************************************************
</tt><tt><br>
</tt><tt>
FRONT END RELOADED </tt><tt><br>
</tt><tt>
Logged on Sat 28 Nov 2009 13:28:04 Monitor uptime was
175 days 19:55:21 </tt><tt><br>
</tt><tt>
Detected on system # 3699. </tt><tt><br>
</tt><tt>
Record sequence number: 17528. </tt><tt><br>
</tt><tt>
************************************************************************
</tt><tt><br>
</tt><tt>
CPU # :,,Front end #: 0,0 </tt><tt><br>
</tt><tt>
Status at reload: No error bits detected </tt><tt><br>
</tt><tt>
Retries: 3 </tt><tt><br>
</tt><tt>
Filename for DUMP:
<SYSTEM>0DMP11.BIN.1,28-Nov-2009 13:27:05 </tt></font><br>
</blockquote>
</blockquote>
</body>
</html>