computing

Solar Cluster: Rack taking shape

Well, it’s been a while since I last updated this project. Lots have been due to general lethargy, real life and other pressures.

This equipment is being built amongst other things to host my websites, mail server, and as a learning tool for managing clustered computing resources. As such, yes, I’ll be putting it down as a work expense… and it was pointed out to me that it needed to be in operation before I could start claiming it on tax. So, with 30th June looming up soon, it was time I pulled my finger out and got it going.

At least running on mains. As for the solar bit, well we will be doing that too, my father recently sent me this email (line breaks for readability):

Subject: Why you're about to pay through the nose for power - ABC News
 (Australian Broadcasting Corporation)
To: Stuart Longland
From: David Longland
http://www.abc.net.au/news/2017-06-19/…
   …why-youre-about-to-pay-through-the-nose-for-power/8629090

Hi Stuart,

This is why I am keen to see your cluster up and running.  Our power 
bill is about $300 every 3 months, a lift in price by 20% represents 
$240pa hike.

Dad

Umm, yeah… good point. Our current little server represents a small portion of our base-load power… refrigeration being the other major component.

I ordered the rack and batteries a few months back, and both have been sitting here, still in the boxes they were shipped in, waiting for me to get to and put them together. My father got fed up of waiting and attacked the rack, putting it together one evening… and last night, we worked together on putting a back on the rack using 12mm plywood.

We also fitted the two switches, mounting the smaller one to the lid of the main switch using multiple layers of double-sided tape.

I wasn’t sure at first where the DIN rail would mount. I had intended to screw it to a piece of 2×4″ or similar, and screw that to the back plane. We couldn’t screw the DIN rail directly to the back plane because the nodes need to be introduced to the DIN rail at an angle, then brought level to attach them.

Considering the above, we initially thought we’d bolt it to the inner run of holes, but two problems presented themselves:

  1. The side panels actually covered over those holes: this was solved with a metal nibbling tool, cutting a slot where the hole is positioned.
  2. The DIN rail, when just mounted at each end, lacked the stability.

I measured the gap between the back panel and the DIN rail location: 45mm. We didn’t have anything that was that width which we could use as a mounting. We considered fashioning a bracket out of some metal strip, but bending it right could be a challenge without the right tools. (and my metalwork skills were never great.)

45mm + 3mm is 48mm… or 4× plywood pieces. We had plenty of off-cut from the back panel.

Using 4 pieces of the plywood glued together and clamped overnight, I made a mounting to which I could mount the DIN rail for the nodes to sit on. This afternoon, I drilled the pilot holes and fitted the screws for mounting that block, and screwed the DIN rail to it.

At the far ends, I made spacers from 3mm aluminium metal strap. The result is not perfect, but is much better than what we had before.

I’ve wired up the network cables… checking the lengths of those in case I needed to get longer cables. (They just fit… phew! $20 saved.) and there is room down the bottom for the batteries to sit. I’ll make a small 10cm cable to link the management network up to the appropriate port on the main switch, then I just need to run cables to the upstairs and downstairs switches. (In fact, there’s one into the area already.)

On the power front… my earlier experiments had ascertained the suitability of the Xantrex charger that we had spare. The charger is a smart charger, and so does various equalisation and balancing cycles, thus gets mightily confused if you suddenly disconnect the battery from it by way of a MOSFET. A different solution presented itself though.

My father has a solar set-up in the back of his car… there’s a 12V 120W panel on the roof, and that provides power to a battery system which powers an amateur radio station and serves as an auxiliary battery. There’s a diode arrangement that allows charging from the vehicle battery system.

In an effort to try and upgrade it, he bought a Redarc BCDC1225 in-vehicle MPPT charger. This charger can accept power from either the 12V mains supply in a vehicle, or from a “12V” solar panel. The key here, is it relies on a changeover relay to switch between the two, and this is where it wasn’t quite suitable for my father’s needs: it assumed that if the vehicle ignition was on, you wanted to charge from the vehicle, not from solar.

He wanted it to switch to whichever source was more plentiful, and had thought the unit would drive the relay itself. Having read the manual, we now know the signal they tell you to connect to the relay coil is there to tell the charger which source it is plugged into, not for it to drive the relay.

The plan is therefore:

  • use a 240V→12V AC-DC switch-mode power supply to provide the “vehicle mains” DC input to the charger.
  • measure the voltage seen at the solar input with a comparator and switch over when it is above some pre-defined voltage (use hysteresis to ensure it doesn’t oscillate)
  • use the output to drive a P-channel MOSFET attached to the “vehicle mains”, which drives the relay.

Bootstrapping Gentoo Linux

So, in amongst my pile of crusty old hardware is the old netbook I used to use in the latter part of my univerity days. It is a Lemote Yeeloong, and sports a ~700MHz Loongson 2F CPU (MIPS III little endian ISA) and 1GB RAM.

Back in the day it was a brilliant little machine. It came out of the box running a localised (for China) version of Debian, and had pretty much everything you’d need. I natually repartitioned the machine, setting up Gentoo and I had a separate partition for Debian, so I could actually dual-boot between them.

Fast forward 10 years, the machine runs, but the battery is dead, and Debian no longer supports MIPS-III machines. Debian Jessie does, but Stretch, likely due for release some time this year, will not, if you haven’t got a CPU that supports mips32r2 or mips64r2, you’re stuffed.

I don’t want to throw this machine away.  Being as esoteric as it is, it is an unlikely target for theft, as to the casual observer, it’ll just be “some crappy netbook”.  If someone were to try and steal it, there’s a very high probability I’ll recover it with my data because the day its PMON2000 boot firmware successfully boots a x86-64 OS like Ubuntu or Windows without the assistance of a VM of some kind would be the day Satan puts a requisition order in for anti-freeze and winter mittens.

My use case is for a machine I can take with me on the bicycle.  My needs aren’t huge: I won’t be playing video on this thing, it’ll be largely for web browsing and email.  The web browser needs to support JavaScript, so that rules out options like ELinks or Dillo, my preferred browser is Firefox but I’ll settle for something Webkit-based if that’s all that’s out there.

So what operating systems do I have for a machine that sports a MIPS-III CPU and 1GB RAM?  Fedora has a MIPS port, but that, like Debian, is for the newer MIPS systems.  Arch Linux too is for newer architectures.

I could bootstrap Alpine Linux… and maybe that’s worth looking into, they seem to be doing some nice work in producing a small and capable Linux distribution.  They don’t yet support MIPS though.

Linux From Scratch is an option, if a little labour intensive.  (Been there, done that.)

OpenBSD directly supports this machine, and so I gave OpenBSD 6.0 a try.  It’s a very capable OS, and while it isn’t Linux, there isn’t much that an experienced Linux user like myself needs to adapt to in order to effectively use the OS.  pkgsrc is a great asset to OpenBSD, with a large selection of pre-built packages already available.  Using that, it is possible to get a workable environment up and running very quickly.  OpenBSD/loongson uses the n64 ABI.

Due to licensing worries, they use a particularly old version of binutils as their linker and assembler.  The plan seems to be they wish to wean themselves off the GNU toolchain in favour of LLVM.  At this time though, much of the system is built using the GNU toolchain with some custom patches.  I found that, on the Yeeloong, 1GB RAM was not sufficient for compiling LLVM, even after adding additional swap files, and some packages I needed weren’t available in pkgsrc, nor would they build with the version of GNU tools available.

Maybe as they iron out the kinks in their build environment with LLVM, this will be worth re-visiting.  They’ve done a nice job so far, but it’s not quite up to where I need it to be.

Gentoo actually gives me the choice of two possible ABIs: o32 and n32o32 is the old 32-bit ABI, and suffers a number of performance problems, but generally works.  It’s what Debian Jessie and earlier supplies, and what their mips32 port will produce from Stretch onwards.

n32 is the MIPS equivalent of what some of you may know as x32 on AMD64 platforms, it is a 32-bit environment with 64-bit long pointers… the idea being that very few applications actually benefit from the use of 64-bit data types, and so the usual quantities like int and long remain the same as what they’d be on o32, saving memory.  The long long data type gets a boost because, although “32-bit”, the 64-bit operations are still available for use.

The trouble is, some applications have problems with this mode.  Either the code sees “mips64” in the CHOST and assumes a full 64-bit system (aka n64), or it assumes the pointers are the same width as a long, or the build system makes silly assumptions as to where things get put.  (virtualenv comes to mind, which is what started me on this journey.  The same problem affects x32 on AMD64.)

So I thought, I’d give n64 a try.  I’d see if I can build a cross-compiler on my AMD64 host, and bootstrap Gentoo from that.

Step 1: Cross-compiler

For the cross-compiler, Gentoo has a killer feature that I have not seen in too many other distributions: crossdev.  This is a toolchain build tool that can generate cross-compiler toolchains for most processor architectures and environments.

This is installed by running emerge sys-devel/crossdev.

A gotcha with hardened

I run “hardened” AMD64 stages on my machines, and there’s a little gotcha to be aware of: the hardened USE flag gets set by crossdev, and that can cause fun and games if, like on MIPS, the hardening features haven’t been ported.  My first attempt at this produced a n64 userland where pretty much everything generated a segmentation fault, the one exception being Python 2.7.  If I booted with init=/bin/bash (or init=/bin/bb), my virtual environment died, if I booted with init=/usr/bin/python2.7, I’d be dropped straight into a Python shell, where I could import the subprocess module and try to run things.

Cleaning up, and forcing crossdev to leave off hardened support, got things working.

Building the toolchain

With the above gotcha in mind:

# crossdev --abis n64 \
           --env 'USE="-hardened"' \
           -s4 -t mips64el-unknown-linux-gnu

The --abis n64 tells crossdev you want a n64 ABI toolchain, and the --env will hopefully keep the hardened flag unset. Failing that, try this:

# cat > /etc/portage/package.use/mips64 <<EOF
cross-mips64el-unknown-linux-gnu/binutils -hardened
cross-mips64el-unknown-linux-gnu/gcc -hardened
cross-mips64el-unknown-linux-gnu/glibc -hardened
EOF

If you want a combination of specific toolchain components to try, I’m using:

  • Binutils: 2.28
  • GCC: 5.4.0-r3
  • glibc: 2.25
  • headers: 4.10

Step 2: Checking our toolchain

This is where I went wrong the first time, I tried building the entire OS, only to discover I had wasted hours of CPU time building non-functional binaries. Save yourself some frustration. Start with a small binary to test.

A good target for this is busybox. Run mips64el-unknown-linux-gnu-emerge busybox, and wait for a bit.

When it completes, you should hopefully have a busybox binary:

RC=0 stuartl@beast ~ $ file /usr/mips64el-unknown-linux-gnu/bin/busybox 
/usr/mips64el-unknown-linux-gnu/bin/busybox: ELF 64-bit LSB executable, MIPS, MIPS-III version 1 (SYSV), statically linked, for GNU/Linux 3.2.0, stripped

Testing busybox

There is qemu-user-mips64el, but last time I tried it, I found it broken. So an easier option is to use real hardware or QEMU emulating a full system. In either case, you’ll want to ensure you have your system-of-choice running with a working 64-bit kernel already, if your real hardware isn’t already running a 64-bit Linux kernel, use QEMU.

For QEMU, the path-of-least-resistance I found was to use Debian. Aurélien Jarno has graciously provided QEMU images and corresponding kernels for a good number of ports, including little-endian MIPS.

Grab the Wheezy disk image and the corresponding kernel, then run the following command:

# qemu-system-mips64el -M malta \
    -kernel vmlinux-3.2.0-4-5kc-malta \
    -hda debian_wheezy_mipsel_standard.qcow2 \
    -append "root=/dev/sda1 console=ttyS0,115200" \
    -serial stdio -nographic -net nic -net user

Let it boot up, then log in with username root, password root.

Install openssh-client and rsync (this does not ship with the image):

# apt-get update
# apt-get install openssh-client rsync

Now, you can create a directory, and pull the relevant files from your host, then try the binary out:

# mkdir gentoo
# rsync -aP 10.0.2.2:/usr/mips64el-unknown-linux-gnu/ gentoo/
# chroot gentoo bin/busybox ash

With luck, you should be in the chroot now, using Busybox.

Step 3: Building the system

Having done a “hello world” test, we’re now ready to build everything else. Start by tweaking your /usr/mips64el-unknown-linux-gnu/etc/portage/make.conf to your liking then adjust /usr/mips64el-unknown-linux-gnu/etc/portage/make.profile to point to one of the MIPS profiles. For reference, on my system:

RC=0 stuartl@beast ~ $ ls -l /usr/mips64el-unknown-linux-gnu/etc/portage/make.profile
lrwxrwxrwx 1 root root 49 May  1 09:26 /usr/mips64el-unknown-linux-gnu/etc/portage/make.profile -> /usr/portage/profiles/default/linux/mips/13.0/n64
RC=0 stuartl@beast ~ $ cat /usr/mips64el-unknown-linux-gnu/etc/portage/make.conf 
CHOST=mips64el-unknown-linux-gnu
CBUILD=x86_64-pc-linux-gnu
ARCH=mips

HOSTCC=x86_64-pc-linux-gnu-gcc

ROOT=/usr/${CHOST}/

ACCEPT_KEYWORDS="mips ~mips"

USE="${ARCH} -pam"

CFLAGS="-O2 -pipe -fomit-frame-pointer"
CXXFLAGS="${CFLAGS}"

FEATURES="-collision-protect sandbox buildpkg noman noinfo nodoc"
# Be sure we dont overwrite pkgs from another repo..
PKGDIR=${ROOT}packages/
PORTAGE_TMPDIR=${ROOT}tmp/

ELIBC="glibc"

PKG_CONFIG_PATH="${ROOT}usr/lib/pkgconfig/"
#PORTDIR_OVERLAY="/usr/portage/local/"

Now, you should be ready to start building:

# mips64el-unknown-linux-gnu-emerge -e \
    --keep-going -j6 --load-average 12.0 @system

Now, go away, and do something else for several hours.  It’ll take that long, depending on the speed of your machine.  In my case, the machine is an AMD Phenom II x6 with 8GB RAM, which was brand new in 2010.  It took a good day or so.

Step 4: Testing our system

We should have enough that we can boot our QEMU VM with this image instead.  One way of trying it would be to copy across the userland tree the same way we did for pulling in busybox and chrooting back in again.

In my case, I took the opportunity to build a kernel specifically for the VM that I’m using, and made up a disk image using the new files.

Building a kernel

Your toolchain should be able to cross-build a kernel for the virtual machine.  To get you started, here’s a kernel config file.  Download it, decompress it, then drop it into your kernel source tree as .config.

Having done that, run make olddefconfig ARCH=mips to set the defaults, then make menuconfig ARCH=mips and customise to your hearts content. When finished, run make -j6 vmlinux modules CROSS_COMPILE=mips64el-unknown-linux-gnu- to build the kernel and modules.

Finally, run make modules_install firmware_install INSTALL_MOD_PATH=$PWD/modules CROSS_COMPILE=mips64el-unknown-linux-gnu- to install the kernel modules and firmware into a convenient place.

Making a root disk

Create a blank, raw disk image using qemu-img, then partition it as you like and mount it as a loopback device:

# qemu-img create -f raw gentoo.raw 8G
# fdisk gentoo.raw
(do your partitioning here)
# losetup -P /dev/loop0 $PWD/gentoo.raw

Now you can format the partitions /dev/loop0pX as you see fit, then mount them in some convenient place. I’ll assume that’s /mnt/vm for now. You’re ready to start copying everything in:

# rsync -aP /usr/mips64el-unknown-linux-gnu/ /mnt/vm/
# rsync -aP /path/to/kernel/tree/modules/ /mnt/vm/

You can use this opportunity to make some tweaks to configuration files, like updating etc/fstab, tweaking etc/portage/make.conf (changing ROOT, removing CBUILD), and setting up a getty on ttyS0. I also like to symlink lib to lib64 in non-multilib environments such as this: Don’t symlink lib and lib64! See below.

# cd /mnt/vm
# mv lib/* lib64
# rmdir lib
# ln -s lib64 lib
# cd usr
# mv lib/* lib64
# rmdir lib
# ln -s lib64 lib

When you’re done, unmount.

First boot

Run QEMU with the following arguments:

# qemu-system-mips64el -M malta \
    -kernel /path/to/your/kernel/vmlinux \
    -hda /path/to/your/gentoo.raw \
    -append "root=/dev/sda1 console=ttyS0,115200 init=/bin/bash" \
    -serial stdio -nographic -net nic -net user

It should boot straight to a bash prompt. Mount the root read/write, and then you can make any edits you need to do before boot, such as changing the root password. When done, re-mount the root as read-only, then exec /sbin/init.

# mount / -o rw,remount
# passwd
… etc
# mount / -o ro,remount
# exec /sbin/init

With luck, it should boot to completion.

Step 5: Making the VM a system service

Now, it’d be real nice if libvirt actually supported MIPS VMs, but it doesn’t appear that it does, or at least I couldn’t get it to work.  virt-manager certainly doesn’t support it.

No matter, we can make do with a telnet console (on loopback), and supervisord to daemonise QEMU.  I use the following supervisord configuration file to start my VMs:

[unix_http_server]
file=/tmp/supervisor.sock   ; (the path to the socket file)

[supervisord]
logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB        ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10           ; (num of main logfile rotation backups;default 10)
loglevel=info                ; (log level;default info; others: debug,warn,trace)
pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false               ; (start in foreground if true;default false)
minfds=1024                  ; (min. avail startup file descriptors;default 1024)
minprocs=200                 ; (min. avail process descriptors;default 200)

; the below section must remain in the config file for RPC
; (supervisorctl/web interface) to work, additional interfaces may be
; added by defining them in separate rpcinterface: sections
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL  for a unix socket

[program:qemu-mips64el]
command=/usr/bin/qemu-system-mips64el -cpu MIPS64R2-generic -m 2G -spice disable-ticketing,port=5900 -M malta -kernel /home/stuartl/kernels/qemu-mips/vmlinux -hda /var/lib/libvirt/images/gentoo-mips64el.raw -append "mem=256m@0x0 mem=1792m@0x90000000 root=/dev/sda1 console=ttyS0,115200" -chardev socket,id=char0,port=65223,host=::1,server,telnet,nowait -chardev socket,id=char1,port=65224,host=::1,server,telnet,nowait -serial chardev:char0 -mon chardev=char1,mode=readline -net nic -net bridge,helper=/usr/libexec/qemu-bridge-helper,br=br0

The following creates two telnet sockets, port 65223 is the VM’s console, 65224 is the QEMU control console. The VM has the maximum 2GB RAM possible and uses bridged networking to the network bridge br0. There is a graphical console available via SPICE.

All telnet and SPICE interfaces are bound to loopback, so one must use SSH tunnelling to reach those ports from another host. You can change the above command line to use VNC if that’s what you prefer.

At this point, the VM should be able to boot on its own. I’d start with installing some basic packages, and move on from there. You’ll find the environment is very sparse (my build had no Perl binary for example) but the basics for building everything should be there.

You may also find that what is there, isn’t quite installed right… I found that sshd wasn’t functional due to missing users… a problem soon fixed by doing an emerge -K openssh (the earlier step will have produced binary packages).

In my case, that’s installing a decent text editor (vim) and GNU screen so I can start a build, then detach.  Lastly, I’ll need catalyst, which is Gentoo’s release engineering tool.

At the moment, this is where I’m at.  GNU screen has indirectly pulled in Perl as a dependency, and that is building as I type this.  It is building faster than the little netbook does, and I have the bonus that I can throw more RAM at the problem than I can on the real hardware. The plan from here:

  1. emerge -ek @system, to build everything that got missed before.
  2. ROOT=/tmp/seed emerge -eK @system, to bundle everything up into a staging area
  3. populating /tmp/seed/dev with device files
  4. tar-ing up /tmp/seed to make my initial “seed” stage for catalyst.
  5. building the first n64 stages for Gentoo using catalyst
  6. building the packages I want for the netbook in a chroot
  7. transferring the chroot to the netbook

Symlinking lib and lib64… don’t do it!

So, I was doing this years ago when n32 was experimental.  I recall it being necessary then as this was before Portage having proper multilib support.  The earlier mipsel n32 stages I built, which started out from kanaka‘s even more experimental multilib stages, required this kludge to work-around the lack of support in Portage.

Portage has changed, it now properly handles multilib, and so the symlink kludge is not only not necessary, it breaks things rather badly, as I discovered.  When packages merge files to /lib, rather than following the symlink, they’ll replace it with a directory.  At that point, all hell breaks loose, because stuff that “appeared” in /lib before is no longer there.

I was able to recover by rsync-ing /lib64 to /lib, which isn’t a pretty solution, but it’ll be enough to get an initial “seed” stage.  Running that seed stage through Catalyst will clean up the remnants of that bungle.

Solar Cluster: Selecting batteries and sources

I’ve been doing quite a bit of thinking on this. Solid-state works but suffers from voltage drop. Relays work but either require the coil to be energised constantly (~1W load) unless you look for latching relays, for which 30A units are hard to come by.

These look promising though. A latching relay is nice since I only need to pulse the coil, not hold it on indefinitely.

That got me thinking what else can I use to switch power? The ideal for me is something that has practically no voltage drop and remembers its state without power. A latching relay fits this requirement. So does a uniselector or stepping switch. Those were commonplace in telephone exchanges years ago, but have since gone the way of the dodo as semiconductor technology replaced it.

The nice thing about a uniselector though for my application is you can switch between N points, instead of just two like a regular relay. So if I buy a third battery, I can wire it up to the uniselector, and have it switch the compute load between the batteries. Likewise, I can connect a charger to the battery most in need of a charge. MCU measures battery voltages, picks the battery with highest voltage to run the load, and the lowest voltage to get a charge. Easy.

That got me thinking… can I make a uniselector? Well of course I can! I basically need to make a rotary switch that can revolve around indefinitely. The shaft of the switch would then be turned by a DC motor.

The stator of a N-way switch would have N+1 pads, one which is the “common”, and the other N would be to each selection. The common pad would be a 180° arc, the others would be 180°/N.

The rotor would feature two brushes 180° apart with a wire connecting them. It is free to move vertically, but must rotate with the shaft, a spring between a nut on the end of the shaft and the rotor applies tension to keep the rotor pressed firmly against the stator.

The interface between rotor and stator features some triangular grooves, so that when the rotor is turned, it pushes it away from the stator, breaking contact. When the rotor passes a critical point, the spring pressing the rotor against these grooves makes the rotor “want” to continue turning until it hits the bottom of the groove, at which point it “sinks” down towards the stator and eventually makes contact again.

Visually, it looks like this:

A small microswitch mounted on the stator could tell us when touch-down takes place, if we use the normally-closed contact to power the motor it will automatically stop the motor when the next position is reached. We then just need to override that open switch by applying a pulse to get things moving.

Power is only needed when we want to change the selector switch. This should be simple enough to fabricate here out of plywood. I don’t have a 3D printer, but you could do it with one of those very easily.

The nature of this switch makes it a break-before-make switch, which has a downside when using it to select which battery to use: there’s a momentary break in power.

I can use diodes to carry the current temporarily. If I run a high-current diode from each battery to the output via a current sensor. If the current sensor measures current flowing through the diodes whilst a battery is selected, then we know that battery is lower than the others by at least the diode voltage drop, and we should consider switching.

Solar Cluster: A no-microcontroller automatic battery selector

Another approach to the selecting a source is to avoid microcontrollers altogether and just rely on non-programmable logic. This is inspired a bit by the Saturday Clock, or rather, my thinking of how it could be done without an MCU.

In selecting a source, we really only care about one thing: is the battery voltage high enough? If no, we need to hunt for one that is.

This question can be answered by a simple analogue comparator such as the LM311, a shift register, and a few other logic gates.

Here, we have such a circuit. Up the top, is our shift register, set up as a ring counter. The buffers there are stand-ins for diodes, if any of them is a 1, the output is a 1 and the NOT gate on the input outputs a 0.

The outputs of the shift register are used to select a battery, which has its own comparator and select logic. The comparator is represented here by the D flip-flop at the extreme left: in essence I’m using this as a switch, Logisim doesn’t provide one, only a momentary button. We need a signal that is high when the battery is above acceptable voltage. We also need its inverse.

The select line from the shift register controls the gate on two tri-state buffers, allowing us to inhibit the comparator’s output. The buffered “good” signal is used to SET the “enable” D-flip-flop that drives the switch turning the battery on. This same (buffered “good”) signal also passes thorough a diode-OR arrangement that indicates whether a source is “available”.

To emulate make-before-break, inverted “select” signal and the “source available” signal pass through an AND gate and into the RESET of the “enable” flip flop, so it gets turned off when another source is turned on.

Finally, the buffered “bad” signal from all modules is fed back on one shared line, inhibiting the clock until a battery drops below the minimum level.

A glitch here is if multiple batteries are initially turned on with none above the minimum voltage, this will cause multiple sources to be selected. This is not too hard to manage in software, and the solution might in fact be to implement this on an ATTiny24A as mentioned in the previous post; this logic circuit can be implemented quite easily in C, with comparators in hardware or using the ADC as a software comparator as I’m doing in the charge controller.

Solar Cluster: Charge controller testing, battery bank dimensioning and other thoughts

So, further progress on the charge controller.

Thinking about the problem … I realised that I really do not want to be testing for VBNVH when entering the CHARGE_CHECK state, as it’ll prematurely terminate the charge when the battery is being bulk-charged.

Better to wait until the charger decides to stop … which we’ll see due to the battery voltage ceasing to increase. We do want to check we’re not critically high however, so we can swap out VH for VCH.

Next, when we find that VBNVBL, meaning the battery is not charging, there we can check for VBNVH and stop charging at that point.

That change, worked pretty well, but it was still flapping between sources. A little state management helped this. If we declare another state variable, charger_warning, we can flip this to 1 upon first detecting that VBNVBL, then wait a little longer. If after a time-out this is still the case, then we can take action. Thus we define a new timer tCWARN, which delays acting on the not-charging case.

A bit of threshold tweaking, and things are behaving themselves. I’m using an el-cheapo 3-way camping fridge as the stand-in for the cluster. This is an Aldi special bought some years ago that draws about 5-6A… and when running on 12V power, features no thermostat.

We’re finding that its cooling capacity is no match for Brisbane’s early autumn weather anyway, so it’s pretty much would be a constant load even if the thermostat worked on 12V.

The battery we’re using is an old 105Ah AGM battery… which is one of two batteries from the caravan we have here. They were the original batteries, and this battery’s mate had failed when both were replaced. We’ve noted this battery getting warm whilst charging, so we think it might now be on the way out too.

What to replace it with? LiFePO₄ is AU$1000 for 100Ah, so too expensive. AGM is still the better bet. I can get 300Ah AGM batteries, but they weigh nearly as much as I do. I can just manage the 105Ah, so we’ll stick with those. I will need more than one long-term.

That brings the thorny issue of connecting them. I am not keen to hook batteries in parallel for various reasons. At least not permanently.

Now, the charger I’m using for mains is a 3-channel charger. I can make additional charge controllers (with the caveat that I need heatsinks for the MOSFETs…sigh!) and I can look for 3-channel solar chargers, or just get multiple chargers for the extra batteries. This can be done.

The load is the elephant in the room. I’d ideally like to manage it as a single load, although conceivably, I could put the switch and one storage node on one battery, a second storage node and a compute node on a second, and the final storage and compute nodes on the third. If the switch goes however, my cluster is toast.

I can put batteries in parallel, but this really does need to be done with care, using carefully matched batteries. So the better solution is to have a controller that chooses the battery with the highest voltage.

There might be an analogue means of implementing this, but a microcontroller is a single-chip solution. The ATTiny24As have up to 8 ADC/GPIO pins and three non-ADC GPIO pins. It’s what I’m already using for the charge controller, so is an easy choice.

The cluster will not tolerate a break-before-make switch-over. I thought about using a capacitor bank to keep the cluster alive during a brief (~1sec max) switch-over. Back-of-the-envelope calculations suggested I would need a 10F capacitor bank. I can get a 16V 470mF capacitor for AU$70 each… and would need 20 of these. Ouch!

A small battery is another option, maybe a 7Ah, but that has its own maintenance issues, and represents a single point of failure.

I can get Schottky diodes capable of 40A, they still present a 0.6V voltage drop. At 30A, that represents 16W! For comparison, a relay with a 225ohm coil resistance will draw ~60mA when the battery is at the maximum of 15V, representing a load of 1W.

Or I can use more MOSFETs like the ones I’m already using, which draw even less power; poor man’s solid-state relays. Latching relays also exist, but they can be rather expensive, more so than a solid-state relay.

I can probably get away with temporary parallel connections, so a make-before-break would let me switch sources. Or, I could place my switch across a Schottky, meaning I put up with that 16W load for a brief moment while I switch sources.

So more to think about, but we are getting close. I can defer this decision until I get a second battery, but I am getting close to the point where the cluster will be running full-time.

Solar Cluster: Charge control flow control diagram

So, as promised, the re-design of the charge controller. … now under the the influence of a few glasses of wine, so this should be interesting…

As I mentioned in my last post, it was clear that the old logic just wasn’t playing nice with this controller, and that using this controller to maintain the voltage to the nodes below 13.6V was unrealistic.

The absolute limits I have to work with are 16V and 11.8V.

The 11.8V comes from the combination of regulator voltage drop and ATX PSU power range limits: they don’t operate below about 10.8V, if you add 700mV, you get 11.5V … you want to allow yourself some head room. Plus, cycling the battery that deep does it no good.

As for the 16V limit… this is again a limitation of the LDOs, they don’t operate above 16V. In any case, at 16V, the poor LDOs are dropping over 3V, and if the node is running flat chat, that equates to 15W of power dissipation in the LDO. Again, we want some headroom here.

The Xantrex charger likes pumping ~15.4V in at flat chat, so let’s go 15.7V as our peak.

Those are our “extreme” ranges.

At the lower end, we can’t disconnect the nodes, but something should be visible from the system firmware on the cluster nodes themselves, and we can thus do some proactive load shedding, hibernating virtual instances and preparing nodes for a blackout.

Maybe I can add a small 10Mbps Ethernet module to an AVR that can wake the nodes using WOL packets or IPMI requests. Perhaps we shut down two nodes, since the Ceph cluster will need 2/3 up, and we need at least one compute node.

At the high end, the controller has the ability to disconnect the charger.

So that’s worked out. Now, we really don’t want the battery getting that critically low. Thus the time to bring the charger in will be some voltage above the 11.8V minimum. Maybe about 12V… perhaps a little higher.

We want it at a point that when there’s a high load, there’s time to react before we hit the critical limit.

The charger needs to choose a charging source, switch that on, then wait … after a period check the voltage and see if the situation has improved. If there’s no improvement, then we switch sources and wait a bit longer. Wash, rinse, repeat. When the battery ceases to increase in voltage, we need to see if it’s still in need of a charge, or whether we just call it a day and run off the battery for a bit.

If the battery is around 14.5~15.5V, then that’s probably good enough and we should stop. The charger might decide this for us, and so we should just watch for that: if the battery stops charging, and it is at this higher level, just switch to discharge mode and watch for the battery hitting the low threshold.

Thus we can define four thresholds, subject to experimental adjustment:

Symbol Description Threshold
V_{CH} Critical high voltage 15.7V
V_H High voltage 15.5V
V_L Low voltage 12.0V
V_{CL} Critical low voltage 11.8V

Now, our next problem is the waiting… how long do we wait for the battery to change state? If things are in the critical bands, then we probably want to monitor things very closely, outside of this, we can be more relaxed.

For now, I’ll define two time-out settings… which we’ll use depending on circumstances:

Symbol Description Period
t_{LF} Low-frequency polling period 15 sec
t_{HF} High-frequency polling period 5 sec

In order to track the state, I need to define some variables… we shall describe the charger’s state in terms of the following variables:

Symbol Description Initial value
V_{BL} Last-known battery voltage, set at particular points. 0V
V_{BN} The current battery voltage, as read by the ADC using an interrupt service routine. 0V
t_d Timer delay… a timer used to count down until the next event. t_{HF}
S Charging source, an enumeration:

  • 0: No source selected
  • 1: Main charging source (e.g. solar)
  • 2: Back-up charging source (e.g. mains power)
0

The variable names in the actual code will be a little more verbose and I’ll probably use #defines for the enumeration.

Below is the part-state-machine part-flow-chart diagram that I came up with. It took a few iterations to try and describe this accurately, I was going to use a state machine syntax similar to what my workplace uses, but in the end, found the ye olde flow chart shows it best.

In this diagram, a filled in dot represents the entry point, a dot with an X represents an exit point, and a dot in a circle represents a point where the state machine re-enters the state and waits for the main loop to iterate once more.

You’ll note that for this controller, we only care about one voltage, the battery voltage. That said, the controller will still have temperature monitoring duties, so we still need some logic to switch the ADC channel, throw away dummy samples (as per the datasheet) and manage sample storage. The hardware design does not need to change.

We can use quiescent voltages to detect the presence of a charging source, but we do not need to, as we can just watch the battery voltage rise, or not, to decide whether we need to take further action.

Solar Cluster: Re-working the charge controller, setting up a home

So, having knocked the regulation on the LDOs down a few pegs… I am closer to the point where I can leave the whole rig running unattended.

One thing I observed prior to the adjustment of the LDOs was that the controller would switch in the mains charger, see the voltage shoot up quickly to about 15.5V, before going HOLYCRAP and turning the charger off.

I had set a set point at about 13.6V based on two facts:

  • The IPMI BMCs complained when the voltage raised above this point
  • The battery is nominally 13.8V

As mentioned, I’m used to my very simple, slow charger, that trickle charges at constant voltage with maximum current output of 3A. The Xantrex charger I’m using is quite a bit more grunty than that. So re-visiting the LDOs was necessary, and there, I have some good results, albeit with a trade-off in efficiency.

Ahh well, can’t have it all.

I can run without the little controller, as right now, I have no panels. Well, I’ve got one, a 40W one, which puts out 3A on a good day. A good match for my homebrew charger to charge batteries in the field, but not a good match for a cluster that idles at 5A. I could just plug the charger directly into the battery and be done with it for now, defer this until I get the panels.

But I won’t.

I’ve been doing some thought about two things, the controller and the rack. On the rack, I found I can get a cheap one for $200. That is cheap enough to be considered disposable, and while sure it’s meant for DJ equipment, two thoughts come to mind:

  • AV equipment with all its audio transformers and linear power supplies, is generally not light
  • It’s on wheels, meant to be moved around… think roadies and such… not a use case that is gentle on equipment

Thus I figure it’ll be rugged enough to handle what I want, and is open enough to allow good airflow. I should be able to put up to 3 AGM batteries in the bottom, the 3-channel charger bolted to the side, with one charge controller per battery. There are some cheap 30A schottky diodes, which would let me parallel the batteries together to form one redundant power supply.

Downside being that would drop about 20-25W through the diode. Alternatively, I make another controller that just chooses the highest voltage source, with a beefy capacitor bank to handle the switch-over. Or I parallel the batteries together, something I am not keen to do.

I spent some time going back to the drawing board on the controller. The good news, the existing hardware will do the job, so no new board needed. I might even be able to simplify logic, since it’s the battery voltage that matters, not the source voltages. But, right now I need to run. So perhaps tomorrow I’ll go through the changes. 😉

Solar Cluster: … and it WORKS

Okay, so now the searing heat of the day has dissipated a bit, I’ve dragged the cluster out and got everything running.

No homebrew charge controller, we have:

240v mains → 20A battery charger → battery → volt/current meter → cluster.

Here’s the volt meter, showing the battery voltage mid-charge:

… and this is what the IPMI sensor readings display…

Okay, the 3.3V and 5V rails are lower than I’d expect, but that’s the duty of the PSU/motherboard, not my direct problem.

The nodes also vary a bit… here’s another one. Same set-up, and this time I’ll show the thresholds (which are the same on all nodes):

Going forward… I need to get the cluster solar ready. Running it all from 12V is half the story, but I need to be able to manage switching between mains and solar.

The battery I am using at the moment is a second-hand 100Ah (so more realistically ~70Ah) AGM cell battery. I have made a simple charger for my LiFePO₄ packs that I use on the bicycle, there I just use a LM2576 switchmode regulator to put out a constant voltage at 3A and leave the battery connected to “trickle charge”. Crude, but it works. When at home, I use a former IBM laptop power supply to provide 16V 3A… when camping I use a 40W “12V” solar panel. I’m able to use either to charge my batteries.

The low output means I can just leave it running. 3A is well below the maximum inrush current capacity of the batteries I use (typically 10 or 20Ah) which can often handle more than 1C charging current.

Here, I’m using an off-the-shelf charger made by Xantrex, and it is significantly more sophisticated, using PWM control, multi-stage charging, temperature compensation, etc. It also puts out a good bit more power than my simple charger.

Consequently I see a faster rise in the voltage, and that is something my little controller will have to expect.

In short, I am going to need a more sophisticated state machine… one that leaves the cut-off voltage decision to the charger. One that notices the sudden drop from ~15V to ~14V and shuts off or switches only after it remains at that level for some time (or gets below some critical limit).

Solar Cluster: Re-working the node power regulators

So… in the last test, I tried setting up the nodes with the ATTiny24A power controller attempting to keep the battery between 11.8 and 13.8V.

This worked… moreover it worked without any smoke signals being emitted.

The trouble was that the voltage on the battery shot up far faster than I was anticipating. During a charge, as much as 15.5V is seen across the battery terminals, and the controller was doing exactly as programmed in this instance, it was shutting down power the moment it saw the high voltage set-point exceeded.

This took all of about 2 seconds. Adding a timeout helped, but it still cycled on-off-on-off over a period of 10 seconds or so. Waay too fast.

So I’m back to making the nodes more tolerant of high voltages.

The MIC29712s are able to tolerate up to 16V being applied with peaks at 20V, no problem there, and they can push 7.5A continuous, 15A peak. I also have them heatsinked, and the nodes so far chew a maximum of 3A.

I had set them up to regulate down to approximately 13.5V… using a series pair of 2.7kΩ and 560Ω resistors for R1, and a 330Ω for R2. Those values were chosen as I had them on hand… 5% tolerance ¼W carbon film resistors. Probably not the best choice… I wasn’t happy about having two in series, and in hindsight, I should have considered the possibility of value swing due to temperature.

Thinking over the problem over the last week or so… the problem seemed to lay in this set point: I was too close to the upper bound, and so the regulator was likely to overshoot it. I needed to knock it back a peg. Turns out, there were better options for my resistor selections without resorting to a trim pot.

Normally I stick to the E12 range, which I’m more likely to have laying around. The E12 series goes …2.7, 3.3, 3.9, 4.7, 5.6… so the closest I could get was by combining resistors. The E24 range includes values like 3.0 and 3.6.

Choosing R1=3.6kΩ and R2=390Ω gives Vout ~= 12.7V. Jaycar sell 1% tolerance packs of 8 resistors at 55c each. While I was there today, I also picked up some 10ohm 10W wire wound resistors… before unleashing this on an unsuspecting AU$1200 computer, I’d try it out with a dummy load made with four of these resistors in parallel… making a load that would consume about 5A for testing.

Using a variable voltage power supply, I found that the voltage could hit 12.7V but no higher… and was at worst .7V below the input. Good enough.

At 16V, the regulator would be dropping 3.3V, passing a worst case 3A current for a power dissipation of 9W out of the total 48W consumption. About 80% efficiency.

Not quite what I had hoped for… but this is a worst case scenario, with the nodes going flat chat and the battery charger pumping all the electrons it can. The lead acid battery has a nominal voltage of 13.8V… meaning we’re dropping 1.1V.

On a related note, I also overlooked this little paragraph in the motherboard handbook:

(*Do not use the 4-pin DC power @J1 when the 24-pin ATX Power @JPW1 is connected to the power supply. Do not plug in both J1 and JPW1 at the same time.)

Yep, guess what I’ve done. Being used to motherboards that provide both and needed both, I plugged them both in.

No damage done as all nodes work fine… (or they did last time I tried them… yet to fire them up since this last bit of surgery). It is possible there is no isolation between the on-motherboard PSU and the external ATX one and that if you did plug in power from two differing sources, you could get problems.

In a way if I had spotted this feature before, I could have done without those little PSUs after all, just needing a Molex-style power adaptor cable to plug into the motherboard.

Still… this works, so I’m not changing it. I have removed that extra connection though, and they’ve been disconnected from the PSUs so they won’t cause confusion in future.

I might give this a try when things cool down a bit … BoM still reports it being about 32°C outside (I have a feeling where I live is a few degrees hotter than that) and so I don’t feel energetic enough to drag my cluster out to the workbench just now. (Edit: okay, I know…those in NSW are facing far worse. Maybe one of the mob in New Holland should follow the advice of Crowded House and take the weather with them over here to the east coast! Not all of it of course, enough to cool us off and reduce their flood.)

Solar Cluster: Load testing… with the new power modules

Well, I’ve finally dragged this project out and plugged everything in to test the new power modules out.

I’ll be hooking up the laptop and getting the nodes to do something strenuous in a moment, but for now, I have them just idling on the battery, with the battery charger being switched by the charge controller, built around an ATTiny24A and this time, a separate power module rather than it being integrated.

I’ve had it going for a few hours now… and so far, so good. The PSU is getting turned on and off more often than I’d like, but at least the smoke isn’t escaping now. The heatsink for the power modules is warm, but still not at the “burn your fingers off” stage.

That to me suggests the largish heatsink was the right one to use.

Two things I need to probably address though:

  • In spite of the LDOs, the acceptable voltage range of the computers is still rather narrow… I’m not sure if it’s just the IPMI BMC being fussy or if the LDOs need to be knocked down a peg to keep the voltage within limits. Perhaps I should use the same resistor values as I did for the Ethernet switch.
  • The thresholds seem to get reached very quickly which means the timeouts still need lengthening. Addressing the LDO settings should help with this, as it’ll mean I can bump my thresholds higher.

If I can nail those last two issues, then I might be at risk of having the hardware aspect of this project done and having a workable cluster to do the software side of the project. Shock horror!