May 1, 2016

Solar cluster: Software stack beginning to take shape.

So, after putting aside the charge controller for now, I’ve taken some time to see if I can get the software side of things into shape.

In the midst of my development, I found a small wiring fault that was responsible for blowing a couple of fuses. A small nick in the sheath of the positive wire in a power cable was letting the crimp part of a DC barrel connector contact +12V. A tweak of that crimp and things are back to normal. I’ve swapped all the 10A fuses for 5A ones, since the regulators are only rated at 7.5A.

The VLANs are assigned now, and I have bonding going between the two pairs of Ethernet devices. In spite of the switch only supporting 4 LAGs, it seems fine with me doing LACP on effectively 10 LAGs. I’ll see how it goes.

The switch has 5 ports spare after plugging in all 5 nodes and a 16-port switch for the IPMI subnet. One will be used for a management interface so I can plug a laptop in, and the others will be paired with LACP for linking to my two existing Cisco SG200-8s.

One of the goals of this project is to try and push the performance of Ceph. In the office, we tried bare Ceph, and found that, while it’s fine for sequential I/O, it suffers a bit with random read/writes, and Windows-based HyperV images like to do a lot of random reads/writes.

Putting FlashCache in the mix really helped, but I note now, it’s no longer maintained. EnhanceIO had only just forked when I tried FlashCache, now it seems that’s the official successor.

There are two alternatives to FlashCache/EnhanceIO: bcache and dm-cache.

I’ll rule out bcache now as it requires the backing image be “formatted” for use. In other words, the backing image is not a raw image, but some proprietary (to bcache) format. This isn’t unworkable, but it raises concerns with me about portability: if I migrate a VM, do I need to migrate its cache too, or is it sufficient to cleanly shut down and detach the bcache device before re-assembling it on the new host?

By contrast, dm-cache and EnhanceIO/FlashCache work with raw backing images, making them much more attractive. Flush the cache before migration or use writethru mode, and all should be fine. dm-cache does however require a separate metadata device: messy, but not unworkable. We can provision the cache-related devices we need using LVM2, and use the kernel-mode Rados block device as our backing image.

So I think my caching subsystem is a two-horse race: dm-cache or EnhanceIO. I guess we’ll give them a try and see how they go.

For those following along at home, if you’re running kernel >4.3, you might want use this fork of EnhanceIO due to changes in the kernel block I/O layer.

To manage the OpenNebula master node, I’ve installed corosync/pacemaker. Normally these are used with DR:BD, however I figure Ceph can fulfil that role. The concepts are similar: it’s a shared block device. I’m not sure if it’ll be LXC, Docker or a VM at this point that “contains” the server, but whatever it is, it should be possible for it to have its root FS and data on Ceph.

I’m leaning towards LXC for this. Time for some more experimentation.

Improved Helmets: Project background

Well, looks like this project is very much thrust into the spotlight having been covered in Hacklet 105 . Mine’s probably the least technical of the lot, it’s definitely worth having a look at what the others are doing, as there’s some really innovative ideas there. Many thanks to @Mike Szczys and @Adam Fabio for the shout-out. 🙂

One thing I haven’t done with this project yet, is to actually post the background of why I’ve started this. A big part of this was I wanted to get permission from the family of a work colleague of mine so that I could mention him by name, but at this stage, permission has not been given, so I have to keep things anonymous.

On the 12th of February, a colleague of mine was cycling to work over the Go Between Bridge here in Brisbane when he lost control on a bend as the bridge joins the Bicentennial Bikeway. This is an off-road, dedicated cycleway, so no cars, and supposedly no pedestrians, however many seem to not understand what a sign with a bicycle symbol and the letters O, N, L, Y mean. (I usually ride past and comment: “Funny bike you’re riding!”. Since this accident though, I intend to be a lot more assertive.)

(Above: the crash scene. That blood smear is still visible on the path today.)

I’m no crash investigator, but I did study physics, and I cycle as my sole means of transport myself, having no driver’s license. (And no interest in getting one either.) I’m familiar with what that bridge is like to cycle over, having done it many times shortly after it opened when I worked at West End.

Looking at the scene though, it was apparent to me that my colleague was going much faster than was sensible for that stretch of road, and something caused him to lose control just prior to the bend.

The resulting impact with the railing was devastating: in addition to a few broken bones elsewhere in the body, he suffered skull fractures, and what I understand now to be a Coup-Contrecoup injury to the brain.

I remember that morning arriving at work early (we both were early birds, and had he not crashed, he would have beaten me that morning), sitting down at my desk and preparing to do battle with U-Boot and an industrial PC, when at 6:34AM, the office phone rings. It was then I learned that my colleague was in a serious condition in hospital, and I then found myself frantically looking for contact details for his wife. (Which were nowhere to be found.)

We later learned he’d never regain consciousness, having lost all executive function in the brain. The only bits that worked, were the bits responsible for low-level muscle control. From bright mind, to persistent vegetative state. He passed away about a fortnight after his accident.

During his brief time in ICU, we were told by one of the people there that these sorts of injuries were common in bicycle and motorcycle accidents. That worried me.

That tells me that perhaps, something is wrong with these blocks of foam we insist on strapping to our heads, and that we’ve missed something. This is one of the first goals I’d like to pinpoint, but so far, has been the most difficult: trying to get hold of data that would statistically prove or disprove how “common” these injuries are.

There’s no point in protecting the skull itself if the brain is to get shaken around to the point that the person winds up with total mental incapacitation.

Research seems to suggest that helmets have had a big hand in reducing the incidents of these injuries, but the fact that it’s still “common”, seems to suggest there’s lots more work to be done.

The standards are focussed on linear acceleration, and single impacts at no more than about 20km/hr. Is that sufficient? I regularly find myself doing 40, and I’m no speed demon. (Hell, I’ve accidentally found myself doing 71km/hr once!) I think it’s time the standards were revised. The question is: how?

My colleague was a key member of our team, and one of the brighter minds I know. While he shouldn’t have taken that bend at such speed and expect to get away with it, he did not deserve to die. I can’t save him, but perhaps I can help save someone else. That’s what this project is about.