No subject


Tue Mar 31 11:32:25 PDT 2015


.BEGIN-HECNET-INFO
ADDR |NAME   |OWNER                   |EMAIL                       |HARDWARE                           |OS                         |LOCATION                                       |NOTES
8.401|CHIMPY|Sampsa Laine     |sampsa at mac.com     |AlphaServer DS10           |OpenVMS 8.3       |London, England                         |Main SAMPSACOM system, SMTP gateway (CHIMPYMAIL.COM)
8.400|GORVAX|Sampsa Laine     |sampsa at mac.com     |SIMH VAX on OSX/Intel |OpenVMS 7.3       |London, England                         |MULTINET bridge to Area 2, Area router
8.403|RHESUS|Sampsa Laine     |sampsa at mac.com     |HP rx2600 Dual 900MHz |OpenVMS 8.4E     |London, England                         |File libraries available
8.500|PYFFLE|Sampsa Laine     |system at pyffle.com|VMWare                               |Pyffle BBS         |London, England                         |Waffle reimplementation BBS, log in as pyffle for access
.END-HECNET-INFO

I can obviously see several potential issues here.
First of all, I'll have to make an assumption about that the first line after the .BEGIN... is a header line to be ignored.

Good point. It should be prefixed with something to indicate it's to be ignored. Unless we're going to use it to help the parsing.


Second, I'll have to assume that the same fields exist, in the same order, always. The other alternative is to actually parse the first line, and hope that the column titles in the first line have been standardized fully, and then match columns to find what I'm looking for after that.

I'd say the first one is easiest to implement, the second more robust.

Third, things like the style of hardware, os and location fields are totally free at this point, which goes against my wish for something a bit uniform. (What kind of OS is "Pyffle BBS" for example?)


Good point, maybe re-label the field "software stack" so it says Ubuntu Linux v<whatever> + Pyffle BBS or whatnot. Or I could just stick Ubuntu in there.


I'm sure that if I were to write something to scrape the stuff, I bet there might turn up other issues as well over time.

Yes, I'm a grumpy fart. :-)

I'm sure, I think the "standard" was devised with like 3 emails or something.

But the point is that scraping is IMHO the best way to go - no need to give people write access to your DB, yet people can still update their info without bugging you.

sampsa





-- 
Johnny Billquist                                   || "I'm on a bus
                                                                  ||   on a psychedelic trip
email: bqt at softjar.se                         ||   Reading murder books
pdp is alive!                                         ||   tryin' to stay hip" - B. Idol



More information about the Hecnet-list mailing list