solar-cluster

Solar Cluster: Charge Controller description

So, I’ve built the controller. The design was pretty simple. Using an ATTiny24A, I’d monitor the voltages of the battery and two power inputs, and code would decide which input to use, if any. It also could use the in-built temperature sensor to control cooling fans. This is the schematic I knocked up this morning.

The values of most resistors are not critical. I found I needed 1kOhm resistors into the bases of the transistors as the MCU was not happy driving them directly. The transistors I’m using are BC547Bs controlling AUIRF4905 MOSFETs.

The only components that are critical are the voltage dividers on the ADC inputs. I’ll be using the built-in 1.1V reference in the MCU as that’s what’s needed for the temperature sensor anyway.

This was a bit of an exercise in reviving old brain cells as it’s been some time since I’ve done a proper PCB myself. This is a one-off prototype with mostly larger components, so no point in getting boards fabricated. I did it the old fashioned way, using a dalo pen then etching in a bath of Ferric Chloride.

That gives you an idea of what the board looked like prior to population. The underside was covered with tape to prevent it from being etched. It took a while, and I think I could have upped the concentration of the solution a bit, since it did leave some tracks un-etched.

Perhaps my solution is getting a little old too… the logo on the bottle really dates it. I found I had to attack the gaps between some tracks with a knife since the etchant didn’t quite get it all.

There are no tracks on the bottom, it’s just one piece of un-etched copper, to act as a ground plane. I guess the construction style is a cross between Manhattan and groundplane (dead-bug) construction. The constructed board looks like this.

I’m not sure what all the LEDs will be doing at this point. Three share pins with the ICSP header, which means they flash as the board is being programmed… useful for troubleshooting ICSP issues. The IC socket is a cheap 14-pin one, I just bent the pins to mount it flush to the board. The 10uF tantalum on the output of the 5V PSU is possibly a 10V one. Where the electrolytic is, is where I had the 330uF tantalum mounted, and it went bang when I gave it 12V.

I tried the following program on the board which just steps through all the LEDs and MOSFETs:

/* board.h */
/* LEDs */
#define LED_U1_BIT		(1 << 7)
#define LED_MOSI_BIT		(1 << 6)
#define LED_MISO_BIT		(1 << 5)
#define LED_SCK_BIT		(1 << 4)
#define LED_U0_BIT		(1 << 3)
#define LED_PORT		PORTA
/* MOSFETs */
#define FET_MAINS		(1 << 0)
#define FET_SOLAR		(1 << 1)
#define FET_FAN			(1 << 2)
#define FET_PORT		PORTB
/* test.c */
#include <avr/interrupt.h>
#include <util/delay.h>
#include <stdint.h>
#include "board.h"
uint8_t heartbeat = 10;
int main(void) {
	DDRA = LED_U1_BIT | LED_MOSI_BIT | LED_MISO_BIT
		| LED_SCK_BIT | LED_U0_BIT;
	DDRB = FET_MAINS | FET_SOLAR | FET_FAN;
	PORTA = 0;
	PORTB = 0;
	/* Test sequence */
	while (1) {
		PORTA = LED_U0_BIT;	_delay_ms(1000);
		PORTA = LED_U1_BIT;	_delay_ms(1000);
		PORTA = LED_MOSI_BIT;	_delay_ms(1000);
		PORTA = LED_MISO_BIT;	_delay_ms(1000);
		PORTA = LED_SCK_BIT;	_delay_ms(1000);
		PORTA = 0;
		PORTB = FET_MAINS;	_delay_ms(1000);
		PORTB = FET_SOLAR;	_delay_ms(1000);
		PORTB = FET_FAN;	_delay_ms(1000);
		PORTB = 0;
	}
	return 0;
}

That seems to prove the hardware is alive, and now I just have to get the software working. Now to try out the toolchain I built!

 

Solar Cluster: Working around build errors in crossdev

I’ve been doing further work on my charge controller. Last weekend, I managed to fix the issues that were causing the MOSFETs to not work, and the “faulty” MOSFET turned out to be fine: the fault was a glitch with my home-fabricated PCB. (No, commercial PCB makers, this is not an invitation for you to advertise as you have done on other projects! The job is done.)

In the midst of doing this, I was also having problems getting the toolchain to generate code correctly, the code would fail to run if I had an interrupt service routine defined, even if the interrupts were not enabled it still failed to do anything. Something in the interrupt vector table was off.

So back to crossdev to see if I can build a newer toolchain that works. crossdev -t avr would get as far as building the full C/C++ compiler, then crap out with a missing ldscript. (I don’t have the log handy to show you the exact error, but it takes place in a ./configure script, so you see “C compiler cannot generate executables” or some such like that, and will see the ldscript error in the offending config.log.)

The kludge?

# cd /usr/avr/lib
# ln -s ../../lib64/binutils/avr/${BINUTILS_VERSION}/ldscripts/

Substitute ${BINUTILS_VERSION} with the active version. Voila, having done that kludge, I now have this:

$ avr-gcc --version
avr-gcc (Gentoo 5.3.0 p1.1, pie-0.6.5) 5.3.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Not sure if it works or not, but I’m documenting the above as a note to myself next time I hit this error.

Solar Cluster: ATTiny24A charge controller: Fail so far

Well, last weekend I had finally acquired the bits after some delays getting everything together. This included some ATTiny24As, some P-channel MOSFETs and some 5V DC-DC PSUs.

Last weekend I got around to building the PCB, and yes, I’m not as handy with a dalo pen as I used to be. That, and the ferric chloride I was using had seen better days (the bottle has “Dick Smith Electronics” printed on it — enough said).

So I spent much time last weekend finding shorts between tracks and attacking those with a sharp knife. Last Sunday I managed to get something built, but not tested.

Today, I got around to testing it. At first I plugged it into a bank of 6 AA cells, and got 0.8V across the power input. WTF? Okay, maybe the DC-DC converter needs a little more current. So I wire up a cable loom with a 5A blade fuse and 30A Andersen connector. Plug everything together: *BOOM*, a tantalum capacitor blows up!

The tantalum was a 330µF that was scavenged from an old computer motherboard, probably only rated for 10V, and when you over-voltage a tantalum, they do throw a tantrum!

Lesson learned: don’t use those scavenged parts for 12V! I swap that out for an electrolytic (rated at 16V).

The reason for my high voltage drop though? A faulty MOSFET. I found that by de-soldering the tab on both input MOSFETs until the problem disappeared. Then pressing down the faulty one caused the short to re-appear. Strange, as the MOSFET has never seen use.

I proceeded minus a MOSFET for a bit, see if I could get some code into the MCU. I was able to program that okay, but then fun came when I tried to use the timer interrupt: including the timer ISR would cause the MCU to not boot. It’d just sit there.

The problem disappeared if I compiled my code without optimisation, or at -O1, but would return at -O2 or -Os (I was using -Os). So something in my toolchain was broken for the ATTiny24A.

Whilst waiting for a toolchain re-build, I decided to tackle the faulty MOSFET. I had one spare, so I carefully soldered that into place, only for the original issue to re-appear, so now I’m completely lost.

I guess next weekend, I’ll take a closer look and the cause will become obvious, but right now I’m more confused than a moth in a light shop!

Solar cluster: Software stack beginning to take shape.

So, after putting aside the charge controller for now, I’ve taken some time to see if I can get the software side of things into shape.

In the midst of my development, I found a small wiring fault that was responsible for blowing a couple of fuses. A small nick in the sheath of the positive wire in a power cable was letting the crimp part of a DC barrel connector contact +12V. A tweak of that crimp and things are back to normal. I’ve swapped all the 10A fuses for 5A ones, since the regulators are only rated at 7.5A.

The VLANs are assigned now, and I have bonding going between the two pairs of Ethernet devices. In spite of the switch only supporting 4 LAGs, it seems fine with me doing LACP on effectively 10 LAGs. I’ll see how it goes.

The switch has 5 ports spare after plugging in all 5 nodes and a 16-port switch for the IPMI subnet. One will be used for a management interface so I can plug a laptop in, and the others will be paired with LACP for linking to my two existing Cisco SG200-8s.

One of the goals of this project is to try and push the performance of Ceph. In the office, we tried bare Ceph, and found that, while it’s fine for sequential I/O, it suffers a bit with random read/writes, and Windows-based HyperV images like to do a lot of random reads/writes.

Putting FlashCache in the mix really helped, but I note now, it’s no longer maintained. EnhanceIO had only just forked when I tried FlashCache, now it seems that’s the official successor.

There are two alternatives to FlashCache/EnhanceIO: bcache and dm-cache.

I’ll rule out bcache now as it requires the backing image be “formatted” for use. In other words, the backing image is not a raw image, but some proprietary (to bcache) format. This isn’t unworkable, but it raises concerns with me about portability: if I migrate a VM, do I need to migrate its cache too, or is it sufficient to cleanly shut down and detach the bcache device before re-assembling it on the new host?

By contrast, dm-cache and EnhanceIO/FlashCache work with raw backing images, making them much more attractive. Flush the cache before migration or use writethru mode, and all should be fine. dm-cache does however require a separate metadata device: messy, but not unworkable. We can provision the cache-related devices we need using LVM2, and use the kernel-mode Rados block device as our backing image.

So I think my caching subsystem is a two-horse race: dm-cache or EnhanceIO. I guess we’ll give them a try and see how they go.

For those following along at home, if you’re running kernel >4.3, you might want use this fork of EnhanceIO due to changes in the kernel block I/O layer.

To manage the OpenNebula master node, I’ve installed corosync/pacemaker. Normally these are used with DR:BD, however I figure Ceph can fulfil that role. The concepts are similar: it’s a shared block device. I’m not sure if it’ll be LXC, Docker or a VM at this point that “contains” the server, but whatever it is, it should be possible for it to have its root FS and data on Ceph.

I’m leaning towards LXC for this. Time for some more experimentation.

Solar cluster: Trying out the analogue controller: FAIL

Well, not sure what went wrong, but the controller I built on Monday evening, dead-bug style, is one big fail.

There’s no output from the LM311s, even after adding pull-ups, they still don’t seem to respond to the battery voltage falling below the threshold. Add to that, a faulty IRF540N MOSFET (drain-source resistance of ~40Ω), and you’ve got all the makings of things going wrong.

So time for a U-turn, after deciding against doing a microcontroller-based solution before on the grounds I had the parts on hand to do an analogue comparator solution, I’ve decided I’ll do it with a ATTiny24A after all. I can get these for about $12 for a pack of 5 from a local supplier.

I also have placed on order, two 5V switchmode PSU modules and four P-channel MOSFETs: we’ll drop the relay as well and make it all solid-state.

The MCU doesn’t have to do much, just take an ADC reading every 100msec of the battery voltage, compare it to a threshold then either turn on or turn off the power.

The MCU has up to 6 ADC channels, embeds a small temperature sensor, has one PWM channel and a number of GPIOs. Reserving the reset and SPI lines for ISP work, that gives us 3 digital outputs and one PWM for controlling things and 4 ADC channels.

I can use the PWM channel to drive a MOSFET for the fans, one of the outputs to drive NPN transistors for controlling the ACPI power buttons on the nodes, and two MOSFETs for the mains and solar inputs. 3 ADCs can monitor the battery, mains and solar inputs, so decisions can be made on whether to switch between solar/mains or to turn off all inputs and let the battery drain for a bit.

The internal temperature sensor can be used for fan control. The internal 8MHz oscillator will be “good enough” I think. It mainly needs to tell the difference between hot and cold. If things are >25°C, then we should run the fans, the hotter it is, the faster they should run.

This isn’t rocket-science, and should be achievable via a simple while loop in C.

Solar cluster: Charge controllers

Well, having gotten the output of the battery sorted out, now it’s time to turn my attention to the input side, namely managing the battery voltage and two possible charge sources.

Now, I have a second-hand Xantrex 20A charger kicking around that I plan to use for when the sun isn’t around and my battery is getting low. When the sun’s out though, I plan to let that charge the battery. I could do this with a small MCU, and I did briefly consider whether I used an ATTiny24A to do it, or one of my spare ATMega8Ls.

I have a beefy 30A relay that can connect and disconnect the charger as needed, it’s a matter of having a controller that decides when it’s needed. I’m not looking for PWM control, the charger will do that itself.

There are two thresholds I want to consider:

  • The low threshold: about 11.5V or so.
  • The high threshold: about 14V.

We want to not let the battery get much below 11.5V as the regulators on the compute nodes will drop up to 700mV and the IPMI BMCs will start to get grumpy. Likewise, they complain when they see more than 13.5V. The regulators should look after it, but let’s not stress them too hard.

I could use a single comparator with hysteresis to do the above, by selecting a reference voltage mid-way between 11.5 and 14V, and setting resistors to set the threshold gap. I’ve decided to just use two comparators, so I can use a LM393, or I have a LM339 kicking around. I also dug around in the junk box and found a stack of MM74C76s, some MM74C221Ns.

Some tinkering with a breadboard, and I came up with this:

Now, the beauty of this set-up, is that I’m using half of each IC, so I effectively have two independent controllers on the one board. Thresholds can be tweaked on each one so that one charger starts sooner than the other, maybe I kick the solar in when battery drops below 12V and let it go to 14V, the mains charger kicks in when we get to 11.5V and stops when we reach 13V.

I haven’t decided on a regulator, yes I could use a LM78C05, the low-power version of the LM7805, as the power drain of this is going to be tiny and headroom enormous for 5V. There are probably better options, I’ll have to shop around, although for a quick prototype, I might just use the LM78C05s since they’re on hand.

Solar cluster: 12V regulator installation

Well, I finally got busy with the soldering iron again. This time, installing the regulators in the cluster nodes and in the 26-port switch.

I had a puzzle as to where to put the regulator, I didn’t want it exposed, as they’re a static-sensitive device, so better to keep them enclosed. It needed somewhere where the air would be flowing, and looking around, I found the perfect spot, just in behind the CPU heatsink. There’s a small gap where the air will be flowing past to cool the CPU, and it’s sufficiently near the ATX PSU to feed the power cabling past.

I found I was able to tap M3 threads into the tops of the heatsinks and fix them to the “front” of the case near where the DIN rail brackets fit in. So from the outside, it looks all neat and tidy.

After installing those, I turned my attention to the switch. Now I had an educated guess that the switch would be stepping down from 12V, so being close to that was not so critical, however going above it would stretch the friendship.

Rather than feeding it 13.1V like the compute nodes, I decided I’d find some alternate resistor values that’d be closer to 12V. Those wound up being R1=3.3kΩ and R2=390Ω, which gave about 11.8V. Close enough. It was then a matter of polarity. The wiring inside this switch uses a non-standard colour code, and as I suspected, the conductors are just paralleled, it’s the one feed of 12V.

Probing with a multimeter revealed the pin pairs were shorted, and removing the PSU confirmed this. I pulled out the switch mainboard and probed around the electrolytics which had their negative sides marked. Sure enough, it’s the Australian Olympic team colours that give away the 0V side.

I’ve shown the original colour code here as coloured dots, but essentially, green and yellow are the 0V side, and red and black are the +12V side. So I had everything necessary. I grabbed a bit of scrap PCB, used the old PSU as a template for drilling out the holes, used a hacksaw to divide the PCB surface up then dead-bugged the rest. To position the heatsink, I drilled a 3mm hole in the bottom of the case and screwed a 10mm M3 stand-off there. Yes, this means there’s an annoying lump on the bottom, I should use a countersunk M3 screw, I’ll fix that later if it bothers me, I’ll be rack-mounting it anyway.

On the input to the regulator, I have a 330uF electrolytic capacitor and 100nF monolithic capacitor in parallel, on the output, it’s a 470uF and a 100nF. A third 100nF hooks the adjust pin to 0V to reduce noise. I de-soldered the original PSUs socket and used that on the new board. It fits beautiful. 100-240V? Not any more Linksys.

So now, the whole lot runs off a single 12V battery supply. The remainder of this project is the charging of that battery and the software configuration of the cluster.

At present, the whole cluster’s doing an `emerge @system`, with distcc running, and drawing about 7.5A with the battery sitting at 12.74V (~95W). Edit: Now that they’ve properly fired up, I’m seeing a drain of 10.3A (126W). Looks that’s going to be the “worst case scenario”.

Solar cluster: Trying out the MIC29712s

I figured, rather than letting these loose directly on the nodes themselves, I’d give them a try with a throw-away dummy load. For this, I grabbed an old Philips car cassette player that’s probably older than I am and hooked that up. I shoved some old cassette in.

The datasheet for the regulators defines the output voltage as: V_{OUT}=1.240 \big({R_1 \over R_2} + 1\big)

Playing with some numbers, I came out with R1 being a 2.7kΩ and 560Ω resistors in series, and R2 being a 330Ω. So I scratched around for those resistors, grabbed one of the MIC29172s and hooked it all up for a test.

The battery here is not the one I’ll use in the cluster ultimately, I have a 100Ah AGM cell battery for that. The charger seen there is what will be used, initially as a sole power source, then in combination with solar when I get the panels. It’s capable of producing 20A of current, and runs off mains power.

This is the power drain from the battery, with the charger turned on.

Not as much as I thought it’d be, but still a moderate amount.

This is what the output side of the regulator looked like:

So from 14.8V down to 13.1V. It also showed 13.1V when I had the charger unplugged, so it’s doing its job pretty well I think. That’s a drop of 1.7V, so dissipating about 600mW. Efficiency is therefore about 93%, not bad for linear regulators.

Solar cluster: Measuring for a cabinet

One elephant in the room, is how I’m going to store the system whilst in operation.

The obvious solution is some sort of metal cabinet with provision for 19″ rack mounting and DIN rail equipment. Question is, how big?

A big consideration here is thermal matters. When going flat out, there will be 100W-150W worth of thermal energy being dissipated in there. So room for convection currents is a must!

Some decent fans on the top to suck the hot air out would also be a good idea. Blowing up so that dust doesn’t get sucked down into the works.

I figured I’d sit everything sort-of in situ. I figured out that the DIN rail mounts don’t have to go on the bottom, with these cases, if you remove the front panel there’s four holes for mounting those same DIN rail mounts on the front. So that’s what I’ve done. I’ve now got a DIN rail spare for future expansion.

If I try to pack everything up as densely as possible (not wise), this is what it looks like:

There’s room there for possibly one more node to squeeze in there. I’d think that’d be pushing it however. 5 is probably a good number, meaning we can space the units out a bit to allow them to draw air in via the gaps.

On top of the units I have my two switches. The old Netcomm 24-port switch was retired from our network when a lightning strike to a neighbour’s tree an 8-port switch, my Yaesu FT-897D radio transceiver, some ports on a wireless 3G router/switch, and an ADSL router out. It also did damage some ports on the big Netcomm switch, so in short, I know it has issues.

Replacing its 3.3V PSU with one that steps down from 12V would cost me the price of a 16-port 10/100Mbps switch brand new.

When we replaced the switch (paid for by insurance) we decided to buy a 8-port and 16-port switch. The 16-port switch, retired due to an upgrade to gigabit, is sitting on top, and takes 12V 1A input. It’ll be perfect for the IPMI VLAN, where speed is not important. It also accepts the DC plugs I bought by mistake.

The 8-port one takes 7.5V 1A, so a little less convenient for this task, I’d need to make a DC-DC converter for it. Maybe later if this works.

So considering a cabinet for this, we have:

  • 5 nodes measuring 190mm in height: ~5 RU
  • A 24-port switch: 1 RU
  • A 16-port switch: 1 RU
  • Some power distribution electronics: 3RU

Yes, the battery and its charger is external to the cabinet.

Judging from this, the cabinet probably needs to be a 10RU or 12RU cabinet to give us space for mounting everything cleanly and to ensure good ventilation. Using 8-port IPMI switches and 24+2-port comms switches, that leaves us with sufficient port space for the 5 nodes and gives us one port left for a small in-chassis monitoring device and 4 ports left on the main switch for an uplink trunk.

You could conceptually then consider these as homogeneous building blocks for larger networks, using Ceph’s CRUSH maps to ensure copies get distributed amongst these “cabinets”.

Solar cluster: Alternate solution to the PicoPSU

So, I’ve been doing a bit of research about how I can stabilise the battery voltage which will drift between around 11V and 14.6V. It’s a deep-cycle type battery, so it’s actually capable of going down to 10V, but I really don’t want to push it that far.

Once I get below 12V, that’s the time to signal to the VM hosts to start hibernating the VMs and preparing for a blackout, until such time as the voltage picks back up again.

The rise above 13.5V is a challenge due to the PicoPSU limitations. @Vlad Conut rightly pointed out that the M3-ATX-HV PSUs sold by the same company would have been a better choice. For about $20 more, or an extra $100 for the cluster, I’d have something that’d work 6-30V. I’d still have to solve the problem with the switch, but it’d just be that one device, not 6.

Maybe it was because they were out of stock that I went the PicoPSU route, I also wasn’t sure about power demands, I knew the CPU needed 20W, but wasn’t sure about everything else. So I over-dimensioned everything. Hindsight is 20:20.

One option I considered was a regulator in front of each node. I had mentioned the LM7812 because I knew of it. I also knew it was a crap choice for this task, the 1.5V drop, with a 5A load would result in about 7.5W dissipated thermally. So 20W would jump to nearly 28W — not good.

That of course assumes a 7812 would handle 5A, which it won’t.

LDOs were the way to go for anything linear, otherwise I was looking at a switchmode power supply. The LM2576 has similar requirements to the LM7812, but is much more efficient being a buck converter. If 1.5V was fine, I’d be looking for a 5A-capable equivalent.

The other option would be to have one single power supply regulate power for all nodes. I mentioned in my previous log about the Redarc DC-DC power supply, and that is certainly still worthy of consideration. It is a single point of failure however, but then again, Redarc aren’t known for making crap, and the unit has a 2 year warranty anyway. I’d have downtime, but would not lose data if this went down.

@K.C. Lee pointed me to one LDO that would be a good candidate though, and is cheap enough to be worth the experiment: the Micrel MIC29750. 7.5A current, and comes in an adjustable form up to 16V. I’d imagine if I set this near 13.5V, it’d dissipate maybe 2.5W max at 5A, or 1W at 2A. Much better.

Not as good as Redarc’s solution of course, and that’s still an option, but cheap enough to try out.