Queueing

A lot of people use queueing for handling data streams and managing how it gets worked on. Whether that’s in routing (here, here, and here for some examples), messaging, traffic etc, it’s a fairly ubiquitous concept. What I haven’t seen elsewhere before, though, is our local ticketing company’s approach to the problem:

Linkin Park – JHB ONLY
You are now in the pre-queue area for Linkin Park – JHB ONLY tickets. When the official queue opens – all customers in the pre-queue area will be given a random place in the queue. Thereafter all queuing becomes sequential.

Citation: here.

To map real-world queues down to making people wait for the chance to buy their (because the system can’t cope with the load) ticket is, well, hilarious. You’re taking the problem from a physical space, to an online one: after the move, you still have the same problem. The reality is that people just can’t wait around in queues all day. But that said, the move is not really unsurprising, especially if we look at this company’s history/skillset/view on fixing this. A quote from one of the concert organizers’, citing what Computicket (our local ticket crowd) said, from the time when the U2 concert ragekilled the ticketing platform

We were very comfortable with what Computicket advised us but there were about 30 000 people on the website at the same time buying the same class of ticket. No system in the world can cope with that. We anticipated huge demand, but it’s about 10% higher than we estimated.

Citation: here.

And yet other people in the world seem perfectly capable of doing this (some are even good at fixing it when they were victim to the issues of not having it right). It’s been happening so often that, many years ago, it was even given a name: The Slashdot Effect. Hell, there’s a bunch of advice collected by people who have fallen victim to this, offered for free. All you have to do is search for it. Not that I’m surprised or anything (at people getting it right). Merely surprised that some people in South Africa still (seem to) stubbornly refuse to believe that anything better than Their Glorious Thing might be possible.

The thought of whether I should launch a ticketing startup has crossed my mind a few times. Perhaps it’s time someone actually did that.

Update: the funny part I only just realized is that they seem to have half learned about the fact that their own stuff sucks, and they outsourced to these people. Who appear to fail just as hard.
Update on the update: it appears these people might not fail hard, but just handle the “making you wait” portion of the problem. It’s still up to Computicket to give you a valid basket interface, tickets, checkout, etc.

Smokeping slave noise

Being in Africa, not all the packet paths are that great. Some people steal copper, others sabotage fibre, somali pirates hijack repair ships, things like that. Slowly but surely the state of things is improving, but for now, loss is inevitable.

Combine this with using smokeping slave instances in far countries, and things can get extremely noisy. And I mean “I had 1200 mails from smokeping since 6am and it’s now 11h39″ noisy. Thankfully, it’s pretty easy to fix, unlike what is said in this post.

Edit Smokeping.pm, jump to the check_alerts subroutine. Change this:

                         sendmail $cfg->{Alerts}{from},$to, <<ALERT;
To: $to
From: $cfg->{Alerts}{from}
Date: $rfc2822stamp
$mail
ALERT
                       }

To this:

                       if ($slave !~ /slaveNameToMatch/) {
                         sendmail $cfg->{Alerts}{from},$to, <<ALERT;
To: $to
From: $cfg->{Alerts}{from}
Date: $rfc2822stamp
$mail
ALERT
                       }
                }

And happiness is. If I feel like looking at more perl later, I’ll try make it a bit more formal (build it into the slave configs, allow it to be a generic check), but for now this’ll do.

S(hitty)NMP

This post will highlight Mikrotik/RouterOS issues, but it’s certainly not only them that suffer from S(hitty)NMP implementations.

Far be it from me thinking SNMP is perfect, nor that it’s necessarily always a good idea. I just have to wonder how it’s possible to screw up such a simple thing.

For example, Mikrotik has this nifty feature where you can look up the OIDs in a specific context by calling the print command with the parameter oid:

[user@R1] /system resource> pri oid
           uptime: .1.3.6.1.2.1.1.3.0
  total-hdd-space: .1.3.6.1.2.1.25.2.3.1.5.131073
   used-hdd-space: .1.3.6.1.2.1.25.2.3.1.6.131073

Except then this happens:

mon# snmpwalk -c ${com} -v 2c ${host} 1.3.6.1.2.1.25.2.3.1.5.131073
HOST-RESOURCES-MIB::hrStorageSize.131073 = No more variables left in this MIB View (It is past the end of the MIB tree)
mon# snmpwalk -c ${com} -v 2c ${host} 1.3.6.1.2.1.25.2.3.1.6.131073
HOST-RESOURCES-MIB::hrStorageUsed.131073 = No more variables left in this MIB View (It is past the end of the MIB tree)

The situation has at least improved vastly, though. Instead of finding the MIB file in some godforsaken dead corner of their documentation site (which is basically just kept on life support), the wiki has a formal section for it now. Still…things could be better:

  • Still no trap support in the various routing protocols/daemons (as far as I know)
  • Various bits of inconsistency like the above items
  • Indexes on dynamic interfaces and the like change, and with no way (that I’m aware of, once again) to lock them to a specific index irrespective of interface state.

There’s another issue that might’ve been fixed in the meantime, I haven’t checked in a while. SNMPv1 has a specific set of counter types, and anything bigger than n (where n was some signed integer limit or something) would only be displayable in SNMPv2. RouterOS just decided to not care about this at all, and respond with the number under the same counter type, but only ever when using SNMPv1.

Seriously, why is this stuff so broken?

HP iLO(2) tapdance

Most people who run bigger sorts of servers are probably familiar with OOB management systems, but for those who aren’t here’s a short summary: you pay a little bit more when you buy your server, and you get a fantastical tool (vendors, please, get this stuff to fit the modern age. It’s not like we don’t want to use them) to use with your server. Power control, hardware status info, (usually) full IP KVM, etc. HP, Dell, Supermicro, Cisco UCS all have this in their own respective flavours.

That’s just to set the tone for what follows. So let’s pretend you live in .za, and you have crappy upstream bandwidth from your home. This would make things like firing up the HP SmartStart ISO on your hardware pretty painful, because uploading all that data takes forever. So what do we do?

We download it to another box on the same network and load up the image via a “hidden” section of iLO that allows us to mount images from an HTTP source, of course:

</>hpiLO-> show
status=0
status_tag=COMMAND COMPLETED

/
  Targets
    system1
    map1
  Properties
  Verbs
    cd version exit show

</>hpiLO-> cd /map1/oemhp_vm/cddr
status=0
status_tag=COMMAND COMPLETED

/map1/oemhp_vm/cddr

</map1/oemhp_vm/cddr>hpiLO-> show
status=0
status_tag=COMMAND COMPLETED

/map1/oemhp_vm/cddr
  Targets
  Properties
    oemhp_image=None
    oemhp_connect=No
    oemhp_boot=No_Boot
    oemhp_wp=No
    oemhp_applet_connected=No
  Verbs
    cd version exit show

</map1/oemhp_vm/cddr>hpiLO-> set oemhp_image=http://192.0.2.1/helpstuff/<ISO_Name_Here.iso>
status=0
status_tag=COMMAND COMPLETED

</map1/oemhp_vm/cddr>hpiLO-> set oemhp_boot=Connect
status=0
status_tag=COMMAND COMPLETED

</map1/oemhp_vm/cddr>hpiLO-> show
status=0
status_tag=COMMAND COMPLETED

/map1/oemhp_vm/cddr
  Targets
  Properties
    oemhp_image=http://192.0.2.1/helpstuff/<ISO_Name_Here.iso>
    oemhp_connect=Yes
    oemhp_boot=Always
    oemhp_wp=Yes
    oemhp_applet_connected=No
  Verbs
    cd version exit show

So, in summary:

We cd to the path that contains cddr (which is the virtual disc path). A note on this, the vm path might sometimes be oemhp_vm1. Do a show under /map if you can’t find the thing.
Then we set oemhp_image and oemhp_boot to values useful for booting.
Now we reboot.

After you’re done with stuff, just set oemhp_boot to Never, and it’ll disconnect stuff.

I didn’t check whether this worked for iLO3 as well, but I’d guess it’s relatively similar. Been a few months since I even looked at an iLO3 system. Here’s the command ref doc for iLO2 if you want to dig around for some more cool stuff.

This Year In Injuries

So, given the relative quiet nature of the past 3 years or so, I think the world is trying to balance things out again this year.

  • We start off with food poisoning or something around middle February. Leaves me nearly incapable of even just sitting up by myself for nearly half a day.
  • Soon thereafter, flu. In March. March is still summer in South Africa. This sucked.

Now earlier this year I’d started with a good gym routine, and was actually making progress. Although I didn’t quite have the right shoes for running and that ended up causing some blisters on my heels. Which was fine, those just take a while to heal up and weren’t too bad, so I just switched over to a heavier focus on cycling for a while. Then:

  • Walking down the stairs in the office one day (while wearing sandals), I slip on the steps (tile), manage to shift my weight quickly and prevent landing on my ass. But at the cost of having a short period of high-speed collision between my heel and the edge of the steps. Cue the entire section of post-blisters calloused skin being shifted loose, bleeding, emergency self-applied patchwork from the office medkit, and a trip to the clinic across the road.
  • Some weeks pass. Reasonably uneventful, short of spilling some hot water on my hand at one point. Ride in to work one morning, throttle cable gets stuck while I’m approaching a slipway with moving traffic. I’m going 40km/h, I basically have two options, and 5m within which to take action. So I brake hard and go down. Didn’t hit traffic, but my left knee got most of the force, and against some broken tar to boot. Ride in to the same clinic (oh and by the way, fresh air on a new wound stings like a bitch), have the nurses laugh at me.
  • On my (more or less) last week of wearing the bandages for the knee, I’m busy packing stuff and moving stuff outside. Carry my server rack outside, start removing panels and doors so that it’s ready to be carried down the stairs. Wiggle the stuck door, tip the rack over. Flail fast out of the way of the rack that’s now following me down the steps, managed to avoid getting crushed, but have my whole toenail ripped out on my left toe. Go to the same clinic (again, since by now I also know they do a good job ;)), have the nurses just burst out laughing. Get that patched up, and over the next few weeks learn just how annoying it is to ever loose a nail. I also now understand how it was used as a method of torture. I could scarcely feel the pain in the first 10~15min after it happened, but the adrenaline burn was so hot that I needed 2 cans of coke and a full mix sundae from an icecream shop near the clinic before I could stand without hugging walls.
  • Tonight, on the way out to see the new Spiderman movie, go around a circle (this one, west to east on Senior) and sideswipe out over a torrent of water. Literally. Half the road was covered with a stream coming downhill. Thankfully just a bit of swelling on my knee, and that should go away in about 3 days.

It pretty much feels like the year is trying to kill me. ‘cept it’s July and I’m still here, so let’s see what can happen further.

If a few people are feeling up to it, I’ll even start a betting pool ;)

Timejumps

So today/tonight/sometime is leap second day. I’m not too sure when it is, exactly. Why? Because I don’t need to:

Jun 30 14:43:09 stratum1 lantime[1850]: Normal Operation  
Jun 30 14:43:17 stratum1 lantime[1850]: Leap second announced  
Jun 30 14:44:12 stratum1 ntpd[2172]: synchronized to PPS(0), stratum 0
Jun 30 14:44:13 stratum1 lantime[1467]: NTP sync to PPS

My timeserver knows. Firmware updates applied, leap seconds announced, music festivals to go to.

Mineshafts

Or: when you seriously need to tunnel

I’ve got some servers sitting 300~500ms away, behind a bad NAT, and GRE/pptp can’t make it through. Quick way to solve it? Build a small crappy VM, install ssh, and make the following modifications to files:

/etc/ssh/sshd_config: append the PermitTunnel directive. Pick one you like from `man 5 sshd_config`
/etc/ssh/ssh_config: append the Tunnel directive. Again, check which you want from `man 5 ssh_config`.

Quickly generate a key for use for the tunnel dial and push it to your dial host:
ssh-keygen -C “tunneling key” -t rsa -f ~/.ssh/tunnel_rsa
ssh-copy-id -i ~/.ssh/tunnel_rsa user@tunnelhost

Now start up the tunnel:
ssh -NTCf -w any user@tunnelhost

Slap IPs on each side:
client:~# ip addr add 192.0.2.1/32 peer 192.0.2.2 dev <tundev>
tunnelhost:~# ip addr add 192.0.2.2/32 peer 192.0.2.1 dev <tundev>

Also, I noticed that between two debian hosts the tunnels defaulted to state DOWN, so a quick ip link set up dev <tundev> was needed each side.

Ping across, check if it works, and if all’s good you should be able to route via the tunnel and do whatever you need to. Since ssh is generally pretty capable and usable everywhere (even over some crazy portforwards), this should get you going fairly easily.

Crescendo!

So I decided to, instead of spamming people up through my blog and IRC and jabber and …. each time I find something cool, rather make a concentrated little project for it.

And I had a useful domain for it around from 2010 as well. So, presenting Earnoms!

Check out the about page for a summary of the project, but definitely keep those music links coming :)

Musical Interlude

After a few days of flu-sourced incapacitation, I’m back onto sorta being alive. Here’s a nice chilled track on some cool instruments:

Read more about this funky little instrument here.

Aftermath

So we survived the day pretty well. Yay for things going as they should ;)

A quick summary would be having one query regarding being unable to hit our test site and that turned out to be a browser issue at the client. The following counters from it (stats from around 15h00 SAST):

   2012-06-06  --  228 IPv4 only
   2012-06-06  --  5 Confused
   2012-06-06  --  1 Web Filter
   2012-06-06  --  46 Dual Stack - IPv6 Preferred
   2012-06-06  --  16 Dual Stack - IPv4 Preferred

Not bad, considering we only took it live sometime last night. Some other people didn’t get by quite so well on v6 day though. Yahoo was one of them. When trying to go to ‘www.yahoo.com’, we get redirected to ‘za.yahoo.com’ with the following DNS records:

vandali % host za.yahoo.com
za.yahoo.com is an alias for fd-fp2.wg1.b.yahoo.com.
fd-fp2.wg1.b.yahoo.com is an alias for ds-fp2.wg1.b.yahoo.com.
ds-fp2.wg1.b.yahoo.com is an alias for ds-any-fp2.wa1.b.yahoo.com.
ds-any-fp2.wa1.b.yahoo.com has address 87.248.112.181
ds-any-fp2.wa1.b.yahoo.com has IPv6 address 2a00:1288:f00e:1fe::3001
ds-any-fp2.wa1.b.yahoo.com has IPv6 address 2a00:1288:f006:1fe::3000
ds-any-fp2.wa1.b.yahoo.com has IPv6 address 2a00:1288:f006:1fe::3001
ds-any-fp2.wa1.b.yahoo.com has IPv6 address 2a00:1288:f00e:1fe::3000

This then blows up at one of their Accelerators:
whoohoo

Worth a slight thought, since Yahoo actually appears to see use over much of Africa.

All said and done, a fairly good day. Didn’t notice any major blowouts elsewhere in the internet (although I should note I wasn’t tracking all news), and I look forward to some write-ups by the usual people (Renesys, HE, Evilrouters, etc) in the next few days. We appear to remain one of the most well-connected IPv6 ISPs in South Africa, and in a pretty good position overall.