[HECnet] LAT

Supratim Sanyal supratim at riseup.net
Fri Dec 17 09:24:57 PST 2021


> On Dec 17, 2021, at 11:39 AM, Johnny Billquist <bqt at softjar.se> wrote:
> 
> Jeez. After a lot of pain, and still not entirely good, I can at least report some good things about LAT with regards to Linux and RSX.

There are abandoned NetBSD and OpenBSD ports as well. I once started to get them to work on my NetBSD/VAX and OpenBSD/VAX instances. If you share something that has a make file we could give it a shot.

> As I mentioned before, there is some kind of a problem between the Linux latd and RSX LAT server. Using llogin to login on RSX systems, the terminal hangs after a while, and there is also some memory leak causing RSX to eventually become non-functional.
> 
> 
> The Linux latd code is horribly weird and ever after digging through it for days, it still does things I cannot explain. But there is definitely one bug in there, which is that it does not count credits when receiving data_b slots. That means the sender can run out of credits, while the Linux latd thinks the remote still have credits, and will not extend more. The "funny" thing is that Linux latd do count the credits when sending data_b slots. So I'd say that is a very obvious error in Linux latd (also - LAT documentation clearly states that data_b slots counts against credits). I've fixed this, and that solved the hanging problem towards RSX. I'm honestly surprised if this has not been a problem anywhere else, as it's the same for any kind of system. Either other systems are not sending data_b slots, or else there are bugs on more sides.
> 
> I can provide the patch for this problem, but I wonder if anyone still "owns" that software, to whom I should send this...
> 
> Second, Linux latd sends attention slots with a stop code of 0x40. This is, according to the LAT documentation, as well as RSX code, an undefined value. Not sure where the Linux latd got that value from.
> 
> Third, Linux latd is broken when it comes to tracking and dealing with ACKs. This one I have not been able to figure out/understand. I can see on the wire that it's sending packets with a lower ACK number than the previous packet it sent out. Looking at the code, as well as trying to understand this in general seems crazy. It should not be possible for this to happen, but it does.
> 
> Fortunately, it is on a stop message, for which RSX isn't happy about for other reasons anyway, so it don't matter. But I still thing it's totally crazy.
> 
> 
> Now, with all that said, I have also had to find and fix a couple of bugs in the RSX LAT code, which also is a little difficult to penetrate. Seems DEC can't really have tested this code that much, and whoever wrote it wasn't careful.
> 
> Fixed version of the LAT bits have been included in the latest BQTCP distribution. If you install the RSX patches, LAT will be fixed.
> 
> There are actually two problems I found in there.
> 1. If a circuit is closed down, and there is currently a transmit in progress, that transmit then becomes a lost buffer upon completion (this is the original RSX error I saw and mentioned before). This is clearly a case of timing issues, which I guess whoever wrote the code just didn't think about, or test carefully.
> 
> 2. Slot attention messages with a valid stop code cause the system to crash. This is really weird. Because Linux latd was using an undefined value in the attention message, things worked just fine, but if I corrected that, RSX crashed. Which suggests that all terminal servers and other LAT software is in fact also using this wrong value in the attention message. Fixing this was just required saving and restoring a couple of registers at the right place. Again, this can't have been tested at all. Possibly the person writing the code thought he tested it by using DECservers or whatever, but if they actually were sending the wrong code, all looked good, but things did not get executed the way it should.
> 
> 
> Finally, Linux latd sends a circuit stop message that RSX do not like at all. The reason being that RSX at that point have already deleted the circuit, so it becomes a stop message for a circuit that does not exist. This will cause the illegal message counter to count, but nothing worse than that.
> 
> I should break out a DECserver and compare to that. But I figure I should let people know about what I've been up to lately, which might also be interesting for others in here...
> 
>  Johnny
> 
> -- 
> Johnny Billquist                  || "I'm on a bus
>                                  ||  on a psychedelic trip
> email: bqt at softjar.se             ||  Reading murder books
> pdp is alive!                     ||  tryin' to stay hip" - B. Idol



More information about the Hecnet-list mailing list