Category Archives: tech - Page 3

Post-jump, things everywhere die

Always a bit interesting to check out stats for things on the internet after a major event like Felix Baumgartner’s jump from space today. Here’s a graph for JINX, one of the two .za INXs.

Interestingly, it seems to have all come from only one of the two GGCs that are visible via that path.

Elsewhere we see similar drops, and also some of the GGCs dying (502’s were served up for a while). Here’s the CAR-IX stats:

During the stream, some people were handed off to Akamai edge nodes. I don’t have that much information on what was happening in all the various networks, though. Will be interesting to see that coming to light soon from the likes of Renesys and such.

Update: I see the first image from this page made it onto one of the .za news syndicators without credit. Nice of them.

Old tricks, new coins; same problems

In setting up some mirrors recently, I’ve come to learn that rsync’s algorithm by default doesn’t deal well with long-haul TCP going hand in hand with small files. I still need to set aside some time to find a nice set of optimization flags to tweak that up a bit more. And when I say “doesn’t deal well”, I mean in the region of “can tx about 1/3rd a single-session TCP speedtest can do over the same path”. Later’s worry, though.

So, the point of this point:

receiver$ nc -l 1234 | pv | tar xf
sender$ tar cf - moz | pv | nc receiver.domain.tld 1234

And there you have a minimal-CPU-usage streaming file delivery setup. Of course, this doesn’t deal with retransmits or anything else. This just gets the bulk over. After that you can run an rsync with big block lengths over all of it, and get any files fixed using that.

And the “old trick” part of this? As far as I know, this is pretty much how the whole tape drive thing used to work[0]. I’ve just added some internet in the middle.

[0] – I wouldn’t know for certain, before my time

Elegua

Public Service Announcement

Anyone who makes use of elegua, the transition of services on it is now complete and I’ve updated the main A and AAAA records to point at the new host.

If you have any issues, you know where to find me.

(That said, the original TTLs were like six gazillion years or something, so caches might flush later as they go. Query the upstream NS for the new record if you need it.)

.co.za domains considered harmful

If anyone ever wants to register a .co.za domain, it looks like you’ll have three options going forward (from the near future):

  1. run away screaming
  2. commit suicide
  3. pay someone else to do it
That’s if we skip over the other practices they have, like refusing to allow you to register a domain if the NS records don’t exist on some servers yet (think about the workflow some DNS hosters take, this might at times be a perfectly normal scenario), or the weird whois setup that still seems to be the default server for most whois clients in the world.

Alongside my froztbyte.net domain, I also have a froztbyte.co.za from before I had a credit card. It’s useful for some stuff. But wow, dealing with coza is a trip. First, they only recently made an EPP interface available, and a quick scan-over of it looks like you need to be a registered/accredited registrar to use it, weighing in at R5000 (presently that’s just below 500eur). No matter, it’s not like I’m going to go find an EPP implementation now to do this. So the antiquated *email* interface it is.

Wander over to their website, grab the update form for my domain, edit it with the new NS info, submit. Wait.

mail:/var/log# tail -n 500 exim4/mainlog | grep 1TEc0u-0003kD-QC
2012-09-20 10:21:36 1TEc0u-0003kD-QC <= jp@domainiwanttoupdate.co.za H=(vandali.neology.co.za) [2001:43e8:8:1::x:x:x:x] P=esmtp S=8582 T="test Thu, 20 Sep 2012 10:21:16 +0200" from <jp@domainiwanttoupdate.co.za> for coza-admin@co.za
2012-09-20 10:21:38 1TEc0u-0003kD-QC == coza-admin@co.za R=dnslookup T=remote_smtp defer (-44): SMTP error from remote mail server after RCPT TO:<coza-admin@co.za>: host mx2.coza.net.za [82.103.142.199]: 450 4.2.0 <mail.neology.co.za[41.73.33.140]>: Client host rejected: Greylisted, see http://postgrey.schweikert.ch/help/co.za.html
2012-09-20 10:22:25 1TEc0u-0003kD-QC == coza-admin@co.za routing defer (-51): retry time not reached
2012-09-20 10:29:29 1TEc0u-0003kD-QC == coza-admin@co.za R=dnslookup T=remote_smtp defer (-44): SMTP error from remote mail server after RCPT TO:<coza-admin@co.za>: host mx2.coza.net.za [82.103.142.199]: 450 4.2.0 <mail.neology.co.za[41.73.33.140]>: Client host rejected: Greylisted, see http://postgrey.schweikert.ch/help/co.za.html
2012-09-20 10:32:05 1TEc0u-0003kD-QC == coza-admin@co.za R=dnslookup T=remote_smtp defer (-44): SMTP error from remote mail server after RCPT TO:<coza-admin@co.za>: host mx2.coza.net.za [82.103.142.199]: 450 4.2.0 <mail.neology.co.za[41.73.33.140]>: Client host rejected: Greylisted, see http://postgrey.schweikert.ch/help/co.za.html
2012-09-20 10:32:25 1TEc0u-0003kD-QC == coza-admin@co.za routing defer (-51): retry time not reached
2012-09-20 10:34:01 1TEc0u-0003kD-QC == coza-admin@co.za R=dnslookup T=remote_smtp defer (-44): SMTP error from remote mail server after RCPT TO:<coza-admin@co.za>: host mx2.coza.net.za [82.103.142.199]: 450 4.2.0 <mail.neology.co.za[41.73.33.140]>: Client host rejected: Greylisted, see http://postgrey.schweikert.ch/help/co.za.html
So I end up actually phoning my domain registrar, in 2012, to find out how long I need to wait. “Up to 45 minutes”. A few exim queue flushes later, the mail went through. Now I should receive the mail that allows me to respond with the the auth cookie. Oh, wait, no:
COZA: ERROR: Invalid phone number format supplied for the registrant phone or fax numbers “froztbyte.co.za”.

I first have to have a validation failure, because the data THEY SUPPLIED doesn’t confirm to their validation schema. This is also not a new thing. They’ve had various schema updates over various points of the ccTLD lifetime, and it’s often just a case of “struggle with it until you get it working”.

Now, given, they seem to have acknowledged that they fail at life as a registrar, thus the new EPP setup and accredited registrars. But for crying out loud, make some reasonable interface for people who aren’t on that system yet. Maybe I’ll do the effort of finding a good registrar….or but I’ll just stop caring about .co.za domains forever and move my stuff elsewhere.

Fun things to come home to

*sigh*….so much for the idea of doing work on Coursera thing (I just signed up for today) tonight:

yariman# tail -n 100 syslog | grep ppp
Sep 10 16:44:41 yariman pppd[24971]: Plugin rp-pppoe.so loaded.
Sep 10 16:44:41 yariman pppd[24972]: pppd 2.4.5 started by root, uid 0
Sep 10 16:45:16 yariman pppd[24972]: Timeout waiting for PADO packets
Sep 10 16:45:16 yariman pppd[24972]: Unable to complete PPPoE Discovery
Sep 10 16:46:21 yariman pppd[24972]: Timeout waiting for PADO packets
Sep 10 16:46:21 yariman pppd[24972]: Unable to complete PPPoE Discovery
Sep 10 16:47:26 yariman pppd[24972]: Timeout waiting for PADO packets
Sep 10 16:47:26 yariman pppd[24972]: Unable to complete PPPoE Discovery
Sep 10 16:48:31 yariman pppd[24972]: Timeout waiting for PADO packets
Sep 10 16:48:31 yariman pppd[24972]: Unable to complete PPPoE Discovery
Sep 10 16:49:36 yariman pppd[24972]: Timeout waiting for PADO packets
Sep 10 16:49:36 yariman pppd[24972]: Unable to complete PPPoE Discovery
Sep 10 16:50:41 yariman pppd[24972]: Timeout waiting for PADO packets
Sep 10 16:50:41 yariman pppd[24972]: Unable to complete PPPoE Discovery
Sep 10 16:51:46 yariman pppd[24972]: Timeout waiting for PADO packets
Sep 10 16:51:46 yariman pppd[24972]: Unable to complete PPPoE Discovery
Sep 10 16:52:51 yariman pppd[24972]: Timeout waiting for PADO packets
Sep 10 16:52:51 yariman pppd[24972]: Unable to complete PPPoE Discovery
Sep 10 16:53:00 yariman pppd[24972]: Terminating on signal 15
Sep 10 16:53:00 yariman pppd[24972]: Exit.

Line sync’d where it always has, good signal vs noise, etc. DSLAM or something in the middle just missing. Now to wait and hope my ticket gets to a useful support person. It *sucks* not having access to the local loop.

And Justin Case™ you couldn’t guess it, that post title is a lie.

Everything is not “just a string”

During a quick conversation on unicode and punycode, I managed to find http://☁→❄→☃→☀→☺→☂→☹→✝.ws

Cute, and a sad reminder of how many people still fight this.

Should I buy some stock?

thoughts: I still have some stock account balance left..wonder if I should buy some..

In [1]: from random import choice
In [2]: choices = {'buy': 0, 'wait': 0}
In [3]: for i in range(0,5000):
    choices[choice(['buy','wait'])] += 1
   ...:
In [4]: choices
Out[4]: {'buy': 2518, 'wait': 2482}

Guess I’ll buy some stocks.

[ed's note: this method works equally well to decide which stocks you want to buy when you're lazy]

Queueing

A lot of people use queueing for handling data streams and managing how it gets worked on. Whether that’s in routing (here, here, and here for some examples), messaging, traffic etc, it’s a fairly ubiquitous concept. What I haven’t seen elsewhere before, though, is our local ticketing company’s approach to the problem:

Linkin Park – JHB ONLY
You are now in the pre-queue area for Linkin Park – JHB ONLY tickets. When the official queue opens – all customers in the pre-queue area will be given a random place in the queue. Thereafter all queuing becomes sequential.

Citation: here.

To map real-world queues down to making people wait for the chance to buy their (because the system can’t cope with the load) ticket is, well, hilarious. You’re taking the problem from a physical space, to an online one: after the move, you still have the same problem. The reality is that people just can’t wait around in queues all day. But that said, the move is not really unsurprising, especially if we look at this company’s history/skillset/view on fixing this. A quote from one of the concert organizers’, citing what Computicket (our local ticket crowd) said, from the time when the U2 concert ragekilled the ticketing platform

We were very comfortable with what Computicket advised us but there were about 30 000 people on the website at the same time buying the same class of ticket. No system in the world can cope with that. We anticipated huge demand, but it’s about 10% higher than we estimated.

Citation: here.

And yet other people in the world seem perfectly capable of doing this (some are even good at fixing it when they were victim to the issues of not having it right). It’s been happening so often that, many years ago, it was even given a name: The Slashdot Effect. Hell, there’s a bunch of advice collected by people who have fallen victim to this, offered for free. All you have to do is search for it. Not that I’m surprised or anything (at people getting it right). Merely surprised that some people in South Africa still (seem to) stubbornly refuse to believe that anything better than Their Glorious Thing might be possible.

The thought of whether I should launch a ticketing startup has crossed my mind a few times. Perhaps it’s time someone actually did that.

Update: the funny part I only just realized is that they seem to have half learned about the fact that their own stuff sucks, and they outsourced to these people. Who appear to fail just as hard.
Update on the update: it appears these people might not fail hard, but just handle the “making you wait” portion of the problem. It’s still up to Computicket to give you a valid basket interface, tickets, checkout, etc.

Smokeping slave noise

Being in Africa, not all the packet paths are that great. Some people steal copper, others sabotage fibre, somali pirates hijack repair ships, things like that. Slowly but surely the state of things is improving, but for now, loss is inevitable.

Combine this with using smokeping slave instances in far countries, and things can get extremely noisy. And I mean “I had 1200 mails from smokeping since 6am and it’s now 11h39″ noisy. Thankfully, it’s pretty easy to fix, unlike what is said in this post.

Edit Smokeping.pm, jump to the check_alerts subroutine. Change this:

                         sendmail $cfg->{Alerts}{from},$to, <<ALERT;
To: $to
From: $cfg->{Alerts}{from}
Date: $rfc2822stamp
$mail
ALERT
                       }

To this:

                       if ($slave !~ /slaveNameToMatch/) {
                         sendmail $cfg->{Alerts}{from},$to, <<ALERT;
To: $to
From: $cfg->{Alerts}{from}
Date: $rfc2822stamp
$mail
ALERT
                       }
                }

And happiness is. If I feel like looking at more perl later, I’ll try make it a bit more formal (build it into the slave configs, allow it to be a generic check), but for now this’ll do.

S(hitty)NMP

This post will highlight Mikrotik/RouterOS issues, but it’s certainly not only them that suffer from S(hitty)NMP implementations.

Far be it from me thinking SNMP is perfect, nor that it’s necessarily always a good idea. I just have to wonder how it’s possible to screw up such a simple thing.

For example, Mikrotik has this nifty feature where you can look up the OIDs in a specific context by calling the print command with the parameter oid:

[user@R1] /system resource> pri oid
           uptime: .1.3.6.1.2.1.1.3.0
  total-hdd-space: .1.3.6.1.2.1.25.2.3.1.5.131073
   used-hdd-space: .1.3.6.1.2.1.25.2.3.1.6.131073

Except then this happens:

mon# snmpwalk -c ${com} -v 2c ${host} 1.3.6.1.2.1.25.2.3.1.5.131073
HOST-RESOURCES-MIB::hrStorageSize.131073 = No more variables left in this MIB View (It is past the end of the MIB tree)
mon# snmpwalk -c ${com} -v 2c ${host} 1.3.6.1.2.1.25.2.3.1.6.131073
HOST-RESOURCES-MIB::hrStorageUsed.131073 = No more variables left in this MIB View (It is past the end of the MIB tree)

The situation has at least improved vastly, though. Instead of finding the MIB file in some godforsaken dead corner of their documentation site (which is basically just kept on life support), the wiki has a formal section for it now. Still…things could be better:

  • Still no trap support in the various routing protocols/daemons (as far as I know)
  • Various bits of inconsistency like the above items
  • Indexes on dynamic interfaces and the like change, and with no way (that I’m aware of, once again) to lock them to a specific index irrespective of interface state.

There’s another issue that might’ve been fixed in the meantime, I haven’t checked in a while. SNMPv1 has a specific set of counter types, and anything bigger than n (where n was some signed integer limit or something) would only be displayable in SNMPv2. RouterOS just decided to not care about this at all, and respond with the number under the same counter type, but only ever when using SNMPv1.

Seriously, why is this stuff so broken?