Public Syndication

Solar Cluster: OpenNebula on Gentoo/musl… no go?

So, I had a go at getting OpenNebula actually running on my little VM.  Earlier I had installed it to the /opt directory by hand, and today, I tried launching it from that directory.

To get the initial user set up, you have to create the ${ONE_LOCATION}/.one/one_auth (in my case; /opt/opennebula/.one/one_auth) with the username and password for your initial user separated by a colon.  The idea here is that is used to initially create the user, you then change the password once you’re successfully logged in.

That got me a little further, but then it still fails… turns out it doesn’t like some options specified by default in the configuration file.   I commented out some options, and that got me a little further again.  oned started, but then went into lala land, accepting connections but then refusing to answer queries from other tools, leaving them to time out.

I’ve since managed to package OpenNebula into a Gentoo Ebuild, which I have now published in a dedicated overlay.  I was able to automate a lot of the install process this way, but I was still no closer.

On a hunch, I tried installing the same ebuild on my laptop.  Bingo… in mere moments, I was staring at the OpenNebula Sunstone UI in my browser, it was working.  The difference?  My laptop is running Gentoo with the standard glibc C library, not musl.  OpenNebula compiled just fine on musl, but perhaps differences in how musl does threads or some such (musl takes a hard-line POSIX approach) is causing a deadlock.

So, I’m rebuilding the VM using glibc now.  We shall see where that gets us.  At least now I have the install process automated. 🙂

Solar Cluster: OpenNebula Front-end setup

So, the front-end for OpenNebula will be a VM, that migrates between the two compute nodes in a HA arrangement.  Likewise with the core router, and border router, although I am also tossing up trying again with the little Advantech UNO-1150G I have laying around.

For now, I’ve not yet set up the HA part, I’ll come to that.  There are guides for using libvirt with corosync/heartbeat, most also call up DR:BD as the block device for the VM, but we will not be using this as our block device (Rados Block Device) is already redundant.

To host OpenNebula, I’ll use Gentoo with musl-libc since that’ll shrink the footprint down just a little bit.  We’ll run it on a MariaDB back-end.

Since we’re using musl, you’ll want to install layman and the musl overlay as not all packages build against musl out-of-the-box.  Also install gentoolkit, as you’ll need to set USE flags, and euse makes this easy:

# emerge layman
# layman -L
# layman -a musl
# emerge gentoolkit

Now that some basic packages are installed, we need to install OpenNebula’s prerequisites. They tell you in amongst these is xmlrpc-c. BUT, they don’t tell you that it needs support for abyss: and the scons build system they use will just give you a cryptic error saying it couldn’t find xmlrpc. The answer is not, as suggested, to specify the path to xmlrpc-c-config, which happens to be in ${PATH} anyway, as that will net the same result, and break things later when you fix the real issue.

# euse -p dev-util/xmlrpc-c -E abyss

Now we can build the dependencies… this isn’t a full list, but includes everything that Gentoo ships in repositories, the remaining Ruby gems will have to be installed separately.

# emerge --ask dev-lang/ruby dev-db/sqlite dev-db/mariadb \
dev-ruby/sqlite3 dev-libs/xmlrpc-c dev-util/scons \
dev-ruby/json dev-ruby/sinatra dev-ruby/uuidtools \
dev-ruby/curb dev-ruby/nokogiri

With that done, create a user account for OpenNebula:

# useradd -d /opt/opennebula -m -r opennebula

Now you’re set to build OpenNebula itself:

# tar -xzvf opennebula-5.4.0.tar.gz
# cd opennebula-5.4.0
# scons mysql=yes

That’ll run for a bit, but should succeed. At the end:

# ./install -d /opt/opennebula -u opennebula -g opennebula

There’s about where I’m at now… the link in the README for further documentation is a broken link, here is where they keep their current documentation.

Solar Cluster: Networking

So, having got some instances going… I thought I better sort out the networking issues proper.  While it was working, I wanted to do a few things:

  1. Bring a dedicated link down from my room into the rack directly for redundancy
  2. Define some more VLANs
  3. Sort out the intermittent faults being reported by Ceph

I decided to tackle (1) first.  I have two 8-port Cisco SG-200 switches linked via a length of Cat5E that snakes its way from our study, through the ceiling cavity then comes up through a small hole in the floor of my room, near where two brush-tail possums call home.

I drilled a new hole next to where the existing cable entered, then came the fun of trying to feed the new cable along side the old one.  First attempt had the cable nearly coil itself just inside the cavity.  I tried to make a tool to grab the end of it, but it was well and truly out of reach.  I ended up getting the job done by taping the cable to a section of fibreglass tubing, feeding that in, taping another section of tubing to that, feed that in, etc… but then I ran out of tubing.

Luckily, a rummage around, and I found some rigid plastic that I was able to tape to the tubing, and that got me within a half-metre of my target.  Brilliant, except I forgot to put a leader cable through for next time didn’t I?

So more rummaging around for a length of suitable nylon rope, tape the rope to the Cat5E, haul the Cat5E out, then grab another length of rope and tape that to the end and use the nylon rope to haul everything back in.

The rope should be handy for when I come to install the solar panels.

I had one 16-way patch panel, so wound up terminating the rack-end with that, and just putting a RJ-45 on the end in my room and plugging that directly into the switch.  So on the shopping list will be some RJ-45 wall jacks.

The cable tester tells me I possibly have brown and white-brown switched, but never mind, I’ll be re-terminating it properly when I get the parts, and that pair isn’t used anyway.

The upshot: I now have a nice 1Gbps ring loop between the two SG-200s and the LGS326 in the rack.  No animals were harmed in the formation of this ring, although two possums were mildly inconvenienced.  (I call that payback for the times they’ve held the Marsupial Olympics at 2AM when I’m trying to sleep!)

Having gotten the physical layer sorted out, I was able to introduce the upstairs SG-200 to the new switch, then remove the single-port LAG I had defined on the downstairs SG-200.  A bit more tinkering going, and I had a nice redundant set-up: setting my laptop to ping one of the instances in the cluster over WiFi, I could unplug my upstairs trunk, wait a few seconds, plug it back in, wait some more, unplug the downstairs trunk, wait some more again, then plug in back in again, and not lose a single ICMP packet.

I moved my two switches and my AP over to the new management VLAN I had set up, along side the IPMI interfaces on the nodes.  The SG-200s were easy, aside from them insisting on one port being configured with a PVID equal to the management VLAN (I guess they want to ensure you don’t get locked out), it all went smoothly.

The AP though, a Cisco WAP4410N… not so easy.  In their wisdom, and unlike the SG-200s, the management VLAN settings page is separate from the IP interface page, so you can’t change both at the same time.  I wound up changing the VLAN, only to find I had locked myself out of it.  Much swearing at the cantankerous AP and wondering how could someone overlook such a fundamental requirement!  That, and the switch where the AP plugs in, helpfully didn’t add the management VLAN to the right port like I asked of it.

Once that was sorted out, I was able to configure an IP on the old subnet and move the AP across.

That just left dealing with the intermittent issues with Ceph.  My original intention with the cluster was to use 802.3AD so each node had two 2Gbps links.  Except: the LGS326-AU only supports 4 LAGs.  For me to do this, I need 10!

Thankfully, the bonding support in the Linux kernel has several other options available.  Switching from 802.3ad to balance-tlb, resolved the issue.

slaves_bond0="enp0s20f0 enp0s20f1"
slaves_bond1="enp0s20f2 enp0s20f3"
config_bond0="null"
config_bond1="null"
config_enp0s20f0="null"
config_enp0s20f1="null"
config_enp0s20f2="null"
config_enp0s20f3="null"
rc_net_bond0_need="net.enp0s20f0 net.enp0s20f1"
rc_net_bond1_need="net.enp0s20f2 net.enp0s20f3"
mode_bond0="balance-tlb"
mode_bond1="balance-tlb"

I am now currently setting up a core router instance (with OpenBSD 6.1) and a OpenNebula instance (with Gentoo AMD64/musl libc).

Toy Synthesizer: Building synth #2

So… last weekend I got to trying out the I/O modules. I rigged up wiring harnesses for all five push-buttons and their associated LED strings.

I used CAT5e since I’ve got loads of it around… and ran four strands up to each button, carrying: +12V, MOSFET drain, GPIO and 0V. I ran a resistor between +12V and GPIO to pull it high, the switch NO and common connected between GPIO and 0V. The button’s illumination LEDs and the LED string both connected in parallel between +12V and MOSFET drain.

So far so good. I have just a 3-pin connection on the I/O module, carrying all but the +12V. That remaining wire I hooked to +12V directly on the input feed. Not having a 0V feed going direct to the power supply though was my mistake.

If this happens, the 0V reference on the I/O module is open-circuit, and the zeners, meant to protect the MCU from +12V, don’t do anything. So I suspect a MC14066 copped a belt of +12V by mistake!

Not sure, but I thought I’d play it safe and build a new one anyway. I started on it earlier this week and finished it this afternoon. This time around I opted to put LEDs on the outputs of the 74HC574. No current-limiting resistors, since they’re meant to be ⅛ duty cycle anyway, and if they smoke, well, who cares?

This highlighted a glitch on the GPIO_EN signal during programming, LEDs would illuminate during MCU programming. A pull-up helps here.

I tested using a 9V supply from my electronics kit, which has AA cells in it (somewhat stale ones) this time around. Handily, the plugs for the LED strings have LEDs in them themselves, and they work at 9V, so we can use those to see what the LED string would do.

On the software side, I finally fixed polyphonics and a key-sticking issue. One of my I/O modules has stopped working for whatever reason, but the others are working fine, as can be seen here:

The other issue I have to chase is some leakage on the blue and white buttons: the plugs for them are not meant to be glowing unless pressed, but you note they are glowing significantly, and get brighter when those buttons are pressed. I might have to re-visit those connections. Otherwise though, very good progress today.

Solar Cluster: First virtual instances running

So, since my last log, I’ve managed to tidy up the wiring on the cluster, making use of the plywood panel at the back to mount all my DC power electronics, and generally tidying everything up.

I had planned to use a SB50 connector to connect the cluster up to the power supply, so made provisions for this in the wiring harness. Turns out, this was not necessary, it was easier in the end to just pull apart the existing wiring and hard-wire the cluster up to the charger input.

So, I’ve now got a spare load socket hanging out the front, which will be handy if we wind up with unreliable mains power in the near future since it’s a convenient point to hook up 12V appliances.

There’s a solar power input there ready, and space to the left of that to build a little control circuit that monitors the solar voltage and switches in the mains if needed. For now though, the switching is done with a relay that’s hard-wired on.

Today though, I managed to get the Ceph clients set up on the two compute nodes, and while virt-manager is buggy where it comes to RBD pools. In particular, adding a RBD storage pool doesn’t work as there’s no way to define authentication keys, and even if you have the pool defined, you find that trying to use images from that pool causes virt-manager to complain it can’t find the image on your local machine. (Well duh! This is a known issue.)

I was able to find a XML cheat-sheet for defining a domain in libvirt, which I was then able to use with Ceph’s documentation.

A typical instance looks like this:

<domain type='kvm'>
  <!-- name of your instance -->
  <name>instancename</name>
  <!-- a UUID for your instance, use `uuidgen` to generate one -->
  <uuid>00ec9b97-c49a-45f8-befe-f74ad6bde2fe</uuid>
  <memory>524288</memory>
  <vcpu>1</vcpu>
  <os>
    <type arch="x86_64">hvm</type>
  </os>
  <clock sync="utc"/>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='network' device='disk'>
      <source protocol='rbd' name="poolname/image.vda">
        <!-- the hostnames or IPs of your Ceph monitor nodes -->
        <host name="s0.internal.network" />
        <host name="s1.internal.network" />
        <host name="s2.internal.network" />
      </source>
      <target dev='vda'/>
      <auth username='libvirt'>
        <!-- the UUID here is what libvirt allocated when you did
	    `virsh secret-define foo.xml`, use `virsh secret-list`
	    if you've forgotten what that is. -->
        <secret type='ceph' uuid='23daf9f8-1e80-4e6d-97b6-7916aeb7cc62'/>
      </auth>
    </disk>
    <disk type='network' device='cdrom'>
      <source protocol='rbd' name="poolname/image.iso">
        <!-- the hostnames or IPs of your Ceph monitor nodes -->
        <host name="s0.internal.network" />
        <host name="s1.internal.network" />
        <host name="s2.internal.network" />
      </source>
      <target dev='hdd'/>
      <auth username='libvirt'>
        <secret type='ceph' uuid='23daf9f8-1e80-4e6d-97b6-7916aeb7cc62'/>
      </auth>
    </disk>
    <interface type='network'>
      <source network='default'/>
      <mac address='11:22:33:44:55:66'/>
    </interface>
    <graphics type='vnc' port='-1' keymap='en-us'/>
  </devices>
</domain>

Having defined the domain, you can then edit it at will in virt-manager. I was able to switch the network interface over to using virtio, plop it on a bridge so it was wired up to the correct VLAN and start the instance up.

I’ve since managed to migrate 3 instances over, namely an estate database, Brisbane Area WICEN’s OwnCloud site, and my own blog.

These are sufficient to try the system out. I’m already finding these instances much more responsive, using raw Ceph even, than the original server.

My next move I think will be to see if I can get corosync/heartbeat to manage a HA VM instance. That is, if one of the compute nodes goes offline, the instance restarts on the other compute node.

Two services come to mind where HA is concerned: terminating the PPPoE link for our Internet, and a virtual management node for a higher-level system such as OpenNebula. OpenNebula really needs something semi-HA, since it really gets its knickers in a twist if the master node goes down. I also want my border router to be HA, since I won’t necessarily be around to migrate it to a different node.

Everything else, well I suspect OpenNebula can itself manage those, and long term the instances I just liberated today from my old box, will become instances within OpenNebula.

The other option is I dip my toe into OpenStack (again), since it is inherently HA by design, but it is also a royal pain to get working.

Toy Synthesizer: 74HC573 replaced with ‘574… I/O modules built

So… earlier in the week I received some 74HC574s (the right chip) to replace the 74HC573s I tried to use.

My removal technique was not pretty, wound up just cutting the legs off the hapless 74HC573 (my earlier hack had busted a pin on it anyway) and removing it that way. Since the holes were full of solder, it was easier to just bend the pins on the ‘574 and surface-mount it.

This afternoon, I gave it a try, and ‘lo and behold, it worked. I even tried hooking up one of the LED strings and driving that with the MOSFET… no problems at all.

I had only built up one of the modules at this stage, so I built another 5 on the same piece of strip board.

The requirement is for 5 channels… this meets that and adds an extra one (the board was wide enough). For the full accompaniment and to have ICSP/networking via external connections, a second board like this could be made, omitting the MOSFETs on four of the channels to handle the ICSP control lines and reducing the capacitances/resistances to suit.

Somewhere I have some TVS diodes for this board, but of course, they have legs, upon which they got up and ran away. Haven’t resurfaced yet. I’m sure they will if I buy more though. The spare footprint on the top-left of the main board is where one TVS diode goes, the others go on the I/O board, two for each channel.

Solar Cluster: Rack installed in-situ

So, there’s some work still to be done, for example making some extension leads for the run between the battery link harness, load power distribution and the charger… and to generally tidy things up, but it is now up and running.

On the floor, is the 240V-12V power supply and the charger, which right now is hard-wired in boost mode. In the bottom of the rack are the two 105Ah 12V AGM batteries, in boxes with fuses and isolation switches.

The nodes and switching is inside the rack, and resting on top is the load power distribution board, which I’ll have to rewire to make things a little neater. A prospect is to mount some of this on the back.

I had a few introductions to make, introducing the existing pair of SG-200 switches to the newcomer and its VLANs, but now at least, I’m able to SSH into the nodes, access the IPMI BMC and generally configure the whole box and dice.

With the exception of the later upgrade to solar, and the aforementioned wiring harness clean-ups, the hardware-side of this dual hardware/software project, is largely complete, and this project now transitions to being a software project.

The plan from here:

  • Update the OSes… as all will be a little dated. (I might even blow away and re-load.)
  • Get Ceph storage up and running. It actually should be configured already, just a matter of getting DNS hostnames sorted out so they can find eachother.
  • Investigating the block caching landscape: when I first started the project at work, it was a 3-horse race between Facebook’s FlashCache, bcache and dmcache. Well, FlashCache is no more, replaced by EnhancedIO, and I’m not sure about the rest of the market. So this needs researching.
  • Management interfaces: at my workplace I tried Ganeti, OpenNebula and OpenStack. This again, needs re-visiting. OpenNebula has moved a long way from where it was and I haven’t looked at the others in a while. OpenStack had me running away screaming, but maybe things have improved.

Solar Cluster: Power distribution harnesses

So, having got the rack mostly together, it is time to figure out how to connect everything.

I was originally going to have just one battery and upgrade later… but when it was discovered that the battery chosen was rather sick, the decision was made that I’d purchase two new batteries. So rather than deferring the management of multiple batteries, I’d have to deal with it up-front.

Rule #1 with paralleling batteries: don’t do it unless you have to. In a perfect world, you can do it just fine, but reality doesn’t work that way. There’s always going to be an imbalance that upsets things. My saving grace is that my installation is fixed, not mobile.

I did look at alternatives, including diodes (too much forward voltage drop), MOSFET switching (complexity), relay switching (complexity again, plus contact wear), and DIY uniselectors. Since I’m on a tight deadline, I decided, stuff it, I’ll parallel them.

That brings me to rule #2 about paralleling batteries: keep everything as close to matched as possible. Both batteries were bought in the same order, and hopefully are from the same batch. Thus, characteristics should be very close. The key thing here, I want to keep cable lengths between the batteries, load and charger, all equal so that the resistances all balance out. That, and using short runs of thick cables to minimise resistance.

I came up with the following connection scheme:

You’ll have to forgive the poor image quality here. On reflection, photographing a whiteboard has always been challenging.

Both batteries are set up in an identical fashion: 40A fuse on the positive side, cable from the negative side, going to an Andersen SB50/10. (Or I might put the fuse on the negative side … haven’t decided fully yet, it’ll depend on how much of each colour wire I have.) The batteries themselves are Giant Power 105Ah 12V AGM batteries. These are about as heavy as I can safely manage, weighing about 30kg each.

The central harness is what I built this afternoon, as I don’t yet have the fuse holders for the two battery harnesses.

The idea being that the resistance between the charger and each battery should be about the same. Likewise, the resistance between the load and each battery should be about the same

The load uses a distribution box and a bus bar. You’ve seen it before, but here’s how it’s wired up… pretty standard:

You might be able to make out the host names there too (periodic table naming scheme, why, because they’re Intel Atoms) … the 5 nodes are on the left and the two switches to the right of the distribution box. I have 3 spare positions.

In heavy black is the 0V bus bar.

This is what I’ve been spending much of my pondering, doing. Part of this harness is already done as it was installed that way in the car, the bit that’s missing is the circuit to the left of the relay that actually drives it. Redarc intended that the ignition key switch would drive the relay, I’ll be exploiting this feature.

Some time this week, I hope to make up the wiring harnesses for the two batteries, and get some charge into them as they’ve sat around for the past two months in their boxes steadily discharging, so I’d be better to get a charger onto them sooner rather than later.

The switch-over circuit can wait for now: just hard-wire it to the mains DC feed for now since there’s no solar yet. The principle of operation is that the comparator (an LM311) compares the solar voltage to a reference (derived from a 5V regulator) and kicks in when the voltage is high enough. (How high? No idea, maybe ~18V?). When that happens, it outputs a logic high signal that turns off the MOSFET. When too low, it pulls the MOSFET gate low, turning it on.

The MOSFET (a P-channel) provides the “ignition key switch” signal to the BCDC1225, fooling it into thinking it is connected to vehicle power, and the charger will boost as needed. The key being that the BCDC1225 makes the decision as to whether the battery needs charging, and how much charge.

By bolting together off-the-shelf parts, we should have something that I can source replacements for should the smoke escape, and there’s no high voltages to deal with.

Solar Cluster: Rack taking shape

Well, it’s been a while since I last updated this project. Lots have been due to general lethargy, real life and other pressures.

This equipment is being built amongst other things to host my websites, mail server, and as a learning tool for managing clustered computing resources. As such, yes, I’ll be putting it down as a work expense… and it was pointed out to me that it needed to be in operation before I could start claiming it on tax. So, with 30th June looming up soon, it was time I pulled my finger out and got it going.

At least running on mains. As for the solar bit, well we will be doing that too, my father recently sent me this email (line breaks for readability):

Subject: Why you're about to pay through the nose for power - ABC News
 (Australian Broadcasting Corporation)
To: Stuart Longland
From: David Longland
http://www.abc.net.au/news/2017-06-19/…
   …why-youre-about-to-pay-through-the-nose-for-power/8629090

Hi Stuart,

This is why I am keen to see your cluster up and running.  Our power 
bill is about $300 every 3 months, a lift in price by 20% represents 
$240pa hike.

Dad

Umm, yeah… good point. Our current little server represents a small portion of our base-load power… refrigeration being the other major component.

I ordered the rack and batteries a few months back, and both have been sitting here, still in the boxes they were shipped in, waiting for me to get to and put them together. My father got fed up of waiting and attacked the rack, putting it together one evening… and last night, we worked together on putting a back on the rack using 12mm plywood.

We also fitted the two switches, mounting the smaller one to the lid of the main switch using multiple layers of double-sided tape.

I wasn’t sure at first where the DIN rail would mount. I had intended to screw it to a piece of 2×4″ or similar, and screw that to the back plane. We couldn’t screw the DIN rail directly to the back plane because the nodes need to be introduced to the DIN rail at an angle, then brought level to attach them.

Considering the above, we initially thought we’d bolt it to the inner run of holes, but two problems presented themselves:

  1. The side panels actually covered over those holes: this was solved with a metal nibbling tool, cutting a slot where the hole is positioned.
  2. The DIN rail, when just mounted at each end, lacked the stability.

I measured the gap between the back panel and the DIN rail location: 45mm. We didn’t have anything that was that width which we could use as a mounting. We considered fashioning a bracket out of some metal strip, but bending it right could be a challenge without the right tools. (and my metalwork skills were never great.)

45mm + 3mm is 48mm… or 4× plywood pieces. We had plenty of off-cut from the back panel.

Using 4 pieces of the plywood glued together and clamped overnight, I made a mounting to which I could mount the DIN rail for the nodes to sit on. This afternoon, I drilled the pilot holes and fitted the screws for mounting that block, and screwed the DIN rail to it.

At the far ends, I made spacers from 3mm aluminium metal strap. The result is not perfect, but is much better than what we had before.

I’ve wired up the network cables… checking the lengths of those in case I needed to get longer cables. (They just fit… phew! $20 saved.) and there is room down the bottom for the batteries to sit. I’ll make a small 10cm cable to link the management network up to the appropriate port on the main switch, then I just need to run cables to the upstairs and downstairs switches. (In fact, there’s one into the area already.)

On the power front… my earlier experiments had ascertained the suitability of the Xantrex charger that we had spare. The charger is a smart charger, and so does various equalisation and balancing cycles, thus gets mightily confused if you suddenly disconnect the battery from it by way of a MOSFET. A different solution presented itself though.

My father has a solar set-up in the back of his car… there’s a 12V 120W panel on the roof, and that provides power to a battery system which powers an amateur radio station and serves as an auxiliary battery. There’s a diode arrangement that allows charging from the vehicle battery system.

In an effort to try and upgrade it, he bought a Redarc BCDC1225 in-vehicle MPPT charger. This charger can accept power from either the 12V mains supply in a vehicle, or from a “12V” solar panel. The key here, is it relies on a changeover relay to switch between the two, and this is where it wasn’t quite suitable for my father’s needs: it assumed that if the vehicle ignition was on, you wanted to charge from the vehicle, not from solar.

He wanted it to switch to whichever source was more plentiful, and had thought the unit would drive the relay itself. Having read the manual, we now know the signal they tell you to connect to the relay coil is there to tell the charger which source it is plugged into, not for it to drive the relay.

The plan is therefore:

  • use a 240V→12V AC-DC switch-mode power supply to provide the “vehicle mains” DC input to the charger.
  • measure the voltage seen at the solar input with a comparator and switch over when it is above some pre-defined voltage (use hysteresis to ensure it doesn’t oscillate)
  • use the output to drive a P-channel MOSFET attached to the “vehicle mains”, which drives the relay.