[HECnet] Old protocols in new ones

Sun Mar 28 13:21:07 PDT 2021

> On Mar 27, 2021, at 11:06 AM, Mark Berryman <mark at theberrymans.com> wrote:
> 
> DDCMP was originally designed to run over intelligent synchronous controllers, such as the DMC-11 or the DMR-11, although it could also be run over async serial lines.  Either of these could be local or remote.  If remote, they were connected to a modem to talk over a circuit provided by a common carrier and async modems had built in error correction.  From the DMR-11 user manual describing its features:
> DDCMP implementation which handles message sequencing and error correction by automatic retransmission
> 
> In other words, DDCMP expected the underlying hardware to provide guaranteed transmission or be running on a line where the incidence of data loss was very low.  UDP provides neither of these.
> 
> DDCMP via UDP over the internet is a very poor choice and will result in exactly what you are seeing.  This particular connection choice should be limited to your local LAN where UDP packets have a much higher chance of surviving.
> 
> GRE survives much better on the internet than does UDP and TCP guarantees delivery.  If possible, I would recommend using one these encapsulations for DECnet packets going to any neighbors over the internet rather than UDP.

GRE is a broadcast subtype, so it follows the Ethernet rules.  That means an idle link tolerates two consecutive packet losses but gets a hello timeout on three consecutive losses.  Also, and this is more serious, if a routing update packet is lost, that route change is not seen by the other end until the background timer (BCT1) fires.

DDCMP is a point to point subtype.  That means an outage that lasts longer than twice (not three times) the hello timer will cause a listen timeout.  On the other hand, if packets are dropped they are retransmitted promptly (a second typically) at the datalink level and the drop is invisible to routing.  In particular, routing packets will get through unless you have a sustained outage.  This is why T1 is by default far larger for point to point links -- it exists only as a "self-stabilization" safety measure to deal with software bugs, not as a protection against packet drop.

DDCMP does NOT expect the underlying hardware to run on a line with "very low" data loss, let alone on a lossless link.  Instead, like any ARQ protocol, it runs correctly (delivers its promised guarantees) even at quite high error rates.  However, also in common with any other ARQ protocol, if the error rate is high the throughput drops a lot.  

There is a classic result from the early days of ARPAnet, when a "high speed backbone" link ran at 56 kbps, that a 1 percent packet drop rate would produce a 50 percent drop in throughput.  With modern links that ratio is likely to be worse.  So if you're running on a path with 1 percent packet drops, DDCMP will run but rather slowly.  And, for the same reason, TCP will also run slowly; perhaps not quite so much because TCP implementations may use faster timeouts.

There are protocols specifically designed for lossy high latency links; deep space satellite links are an example.  ARQ is not used in such cases, instead one uses FEC (forward error correction) such that packets are delivered even in the presence of a specified level of bit error or packet error.  But those schemes are way outside what we deal with.

One of these days I will hack up a quick & dirty "network simulator" that provides a pipe with specified error rates, then run DDCMP over it to see how it performs under abusive conditions.  Hopefully that will confirm my implementation is correct in these areas; if not I'll fix it.

	paul