Computing

Solar Cluster: ADSL powered by the sun

So, I’ve now moved the ADSL and router onto the battery supply. This has added an extra amp of load, but really, the solar panel handles this easy.

I dug up one of my spare switchmode PSU modules and then got to thinking about how I’d mount the thing. In the end, double sided tape… to keep the terminals of the adjustment pot from shorting, to a piece of old copper clad PCB from the project graveyard, with some wires soldered on.

The donor PCB already had regions cut out for terminals around the edge, so I could use those for drilling mounting holes. I just made additional terminal pads for soldering the input and output supply rails. Initially I tried putting a 1mF capacitor across the output, but evidently the one I grabbed was crook as it presented a 10Ω load. I don’t think the cause was due to it charging. The PSU has a 220µF there already, so let’s see how it fares.

Fairly simple, +12V comes in via the orange wire into IN+, the “LM2596” steps that down to 5V, comes out the red wire. Screw terminals allow me to swap input and output.

Before hooking it up to the ADSL modem, I made sure to dial it in to 5V.

Meh… who’s going to care about 3mV. 🙂

As it happens, the original PSU puts out 5.3V. I think I’m closer. I can always dial it up if needed.

I put the lid on the case and made up the rest of my wiring harness. One 5A blade fuse, a bit of work around the back of the rack, and it was installed.

In the meantime, I have my old server busy pushing its last daily back-up across to a newly provisioned virtual machine on the cluster.

One problem this presents is that this one VM occupies about 70% of my usable storage cluster capacity. The cases can take one 2.5″ HDD, which unless you’re willing to risk it with Seagate (I’ve had too many of them fail), top-out at 2TB.

There are SSDs too, but I’m not made of money, and I’ve already spent the cost of a small car on this cluster as it is. My thinking is I might look at modifying the cases with a new lid to accept a 3.5″ HDD. If I make the case a wee bit taller, a 3.5″ HDD would fit in the lid, and I could add fans around it to cool it.

The other option is to make external eSATA 3.5″ DIN-rail mounted cases. I did look online, but didn’t see any for sale. That said, space is getting squeezy on that DIN rail, and I do have to be mindful of cooling.

Solar Cluster: Re-locating the ADSL service

So last week, I came home to no power, which of course meant no Internet because the ADSL service is still on mains power.

This is something that’s been on my TO-DO list for a while now, and I’ve been considering how to go about it.

One way was to run 12V from the server rack to the study where the ADSL is. I’d power the study switch (a Cisco SG-208), the ADSL modem/router (a TP-Link TD-8817) and the border router (an Advantech UNO-1150G).

The border router, being a proper industrial PC is happy with any voltage between 9 and 32V, but will want up to 24W, so there’s 2A. The ADSL modem needs 5V 1A… easy enough, and the switch needs 12V, not sure what power rating. I’m not sure if it’ll take 15V, I’d be more comfortable putting it on an LDO like I did for the Linksys switch and the cluster nodes. (Thanks to @K.C. Lee for the suggestion on those LDOs.)

With all that, we’re looking at 3-4A of current at 12V, over a distance of about 5 metres. The 6 AWG cable I used to hook panels to solar controller is obviously massive overkill here, but CAT5e is not going to cut it… it needs to be something around the realm of 12 AWG… 20 at the smallest.

I have some ~14AWG speaker cable that could do it, but that sounds nasty.

The other approach is to move the ADSL. After finding a CAT3 6P4C keystone insert, I dug out some CAT5e (from a box that literally fell off the back of a truck), slapped my headlamp onto my hard hat, plonked that on my head and got to work.

It took me about an hour to install the new cable. I started by leaving the network-end unterminated, but with enough loose cable to make the distance… worked my way back to the socket location, cut my cable to length, fitted the keystone insert, then went back to the ADSL splitter and terminated the new run.

There was a momentary blip on the ADSL (or maybe that was co-incidence), then all was good.

After confirming I still had ADSL on the old socket, I shut down the router and ADSL modem, and re-located those to sit on top of the rack. Rather than cut new cables, I just grabbed a power board and plugged that in behind the rack, and plugged the router and modem into it. I rummaged around and found a suitably long telephone cable (with 6P6C terminations), and plugged that in. Lo and behold, after a minute or two, I had Internet.

The ugly bit though is that the keystone insert didn’t fit the panel I had, so for now, it’s just dangling in the air. No, not happy about that, but for now, it’ll do. At worst, it only has to last another 3 years before we’ll be ripping it out for the NBN.

The other 3 pairs on that CAT5e are spare.  If I want a 56kbps PSTN modem port, I can wire up one of those to the voice side of the ADSL splitter and terminate it here.

I think tomorrow, I’ll make up a lead that can power the border router directly from the battery.  I have two of these “LM2596HV” DC-DC converter modules.  I’m thinking put an assortment of capacitors (a few beefy electrolytics and some ceramics) to smooth out the DC output, and I can rummage around for a plug that fits the ADSL modem/router and adjust the supply for 5V.  I’ll daisy-chain this off the supply for the border router.

We’re slated for Hybrid Fibre Coax for NBN, when that finally arrives.  I’ll admit I am nowhere near as keen as I was on optic fibre.  Largely because the coax isn’t anywhere near as future-proofed, plus in the event of a lightning strike hitting the ground, optic fibre does not conduct said lightning strike into your equipment; anything metallic, will.

By moving the ADSL to here though, switching to the NBN in the next 12-24 months should be dead easy.  We just need to run it from the junction box outside, nailing it to the joists under the floor boards in our garage through to where the rack is.  No ceiling/wall cavities or confined spaces to worry about.  If the NBN modem needs a different voltage or connector, we just give that DC-DC converter a tweak and replace the output cable to suit.

We of course wait before switching the DC supply until after we’ve proven it working from mains power in the presence of the installer.  Keep the original PSU handy and intact for “debugging” purposes. 😉

There is an existing Foxtel cable, from the days when Foxtel was an analogue service, and I remember the ol’e tug-o-war the installer had with that cable.  It is installed in the lounge room, which is an utterly useless location for the socket, and given the abuse the cable suffered (a few channels were a bit marginal after install), I have no faith in it for an Internet connection.  Thus, a new cable would be best.  I’ll worry about that when the time comes.

On the power supply front… I have my replacement.  The big hold-up with installing it though is I’ll need to get a suicide lead wired up to the mains end, then I need to figure out some way to protect that from accidental contact.  There’s a little clear plastic cover that slips over the contacts, but it is minimal at best.

I’m thinking a 3D printed or molded two-part cover, one part which is glued to the terminal block and provides the anchor point for the second part which can house a grommet and screw into the first block.  That will make the mains end pretty much as idiot-resistant as it’s possible to be.  We’ll give that some thought over the weekend.

The other end, is 15V at most, I’m not nearly so worried about that, as it won’t kill you unless you do something incredibly stupid.

Bad news travels fast, unless you’re Atlassian

So, on Friday, I had a job to update some documentation.  Specifically, I had to update the code examples on a Confluence document.

No problem… or so I thought.  The issue I faced was that it seems the Confluence application is getting too clever for its own good.  Honestly, I’d be happier with a plain textarea which took some Wiki syntax such as Markdown… or heck… plain HTML!  I use WordPress on this blog here, and while the editor here isn’t bad, I’m thankful that going to the source editor is just a click away, as there’s some things the WYSIWYG editor can’t do well (inline code), or even at all (tables).

The editor in Confluence is much less polished.  Navigating with the arrow keys is an unpredictable experience, sometimes it moves by single lines, sometimes it jumps a page.  Sometimes, starting several lines deep in a code block, a single up-arrow will move you to the line above, sometimes it moves you to some line in a paragraph above the code block.  It’s an exercise in frustration.

Fine, I thought, I’ll just copy and paste the code into qvim.  Highlight… copy… paste… ohh brilliant, it’s now all stuffed onto one line!  Thankfully what I was editing, was JSON, so it’s real easy to re-format that, vim makes it real easy to pipe the buffer contents through an arbitrary external program such as python -m json.tool.  This lacked the flexibility to auto-format the JSON the way the code examples were formatted though, so I made a work-alike that made use of Python’s OrderedDict to sort the keys a bit more logically, and told json.dump to indent the code with 2-space indentation (this is how the existing examples were formatted).

Having done this, I thought I’d make mention to Atlassian about the issues with their editor.  I hit the Feedback link up the top of the page.  I pointed out the issues I was having.  In closing I also pointed out how sluggish their system was.  The desktop PC at work is a 8-core AMD Ryzen 7 1700 with 16GB of DDR4.  Not a slow machine.  Maybe it’s rose-coloured glasses, but I recall having a smoother editing experience with Microsoft Word for Windows 6.0 on my 33MHz 486/DX, which sported a whopping 8MB RAM.  Hot stuff back in 1994.  My present desktop does fine with LibreOffice, and this WordPress blog works fine in it, so I know it’s not my browser or hardware.  Yet Confluence struggles, on a PC that has 8 times the CPU cores, each running at nearly 10 times the clock speed, and with 2048 times the amount of RAM to boot.

I composed my feedback and sent it Friday afternoon.  I left the browser window open while I submitted the feedback, and went home.  This morning, I get in, enter my password to unlock the workstation, and see this:

Atlassian feedback … *still* sending after a whole week-end!

Yep, about 2kB of plain text has taken more than 50 hours to make its way from my desktop to their back-end servers.  Did a feral cat interrupt their RFC-1149 based Internet link?

Solar Cluster: Beware, I’m ARMed

Last night, I got home, having made a detour on my way into work past Jaycar Wooloongabba to replace the faulty PSU.
It was a pretty open-and-shut case, we took it out of the box, plugged it in, and sure enough, no fan.  After the saleswoman asked the advice of a co-worker, it was confirmed that the fan should be running.
It took some digging, but they found a replacement, and so it was boxed up (in the box I supplied, they didn’t have one), and I walked out the door with PSU No. 3.
I had to go straight to work, so took the PSU with me, and that evening, I loaded it into the top box to transport home on the bicycle.
I get home, and it’s first thing on my mind.  I unlock the top box, get it out, and still decked out in my cycling gear, helmet and all (needed the headlight to see down the back of the rack anyway), I get to work.
I put the ring lugs on, plug it into the wall socket and flick the switch.
Nothing.
Toggle the switch on the front, still nothing.
Tried the other socket on the outlet, unplugging the load, still nothing.  Did the 10km trip from Milton to The Gap kill it?
Frustrated, I figure I’ll switch a light on.  Funny… no lights.
I wander into the study… sure enough, the router, modem and switch are dead as doornails.  Wander out to the MDB outside, saw the main breaker was still on, and tried hitting the test button.  Nothing.
I wander back inside, switching the bike helmet for my old hard hat, since it looks as if I’ll need the headlight a bit longer, then take a sticky beak down the road to see if anyone else is facing the same issue.
Sure enough, I look down the street, everyone’s out.
So there goes my second attempt at bootstrapping Gentoo, and my old server’s uptime.
The power did return about an hour or so later.  The PSU was fine, you don’t think of the mains being out as the cause of your problems.
I’ll re-start my build, but I’m not going to lose another build to failing power.  Nope, had enough of that for a joke.
I could have rigged up a UPS to the TS-7670, but I already have one, and it’s in the very rack where it’ll get installed anyway.  Thus, no time like the present to install it.
I’ll have to configure the switch to present the right VLANs to the TS-7670, but once I do that, it’ll be able to take over the role of routing between the management VLAN and the main network.
I didn’t want to do this in a VM because that means exposing the hosts and the VMs to the management VLAN, meaning anyone who managed to compromise a host would have direct access to the BMCs on the other nodes.
This is not a network with high bandwidth demands, and so the TS-7670 with its 100Mbps Ethernet (built into the SoC; not via USB) is an ideal machine for this task.
Having done this, all that’s left to do is to create a 2GB dual-core VM which will receive the contents of the old server, then that server can be shut down, after 8 years of good service.  I’ll keep it around for storing the on-site backups, but now I can keep it asleep and just wake it up with Wake-on-LAN when I want to make a back-up.
This should make a dint in our electricity bill!
Other changes…

  • Looks like we’ll be upgrading the solar with the addition of another 120W panel.
  • I will be hooking up my other network switches, the ADSL router and ADSL modem up to the battery bank on the cluster, just got to get some suitable cable for doing so.
  • I have no faith in this third PSU, so already, I have a MeanWell HEP-600C coming.  We’ll wire up a suicide lead to it, and that can replace the Powertech MP-3089 + Redarc BCDC1225, as the MeanWell has a remote on/off feature I can use to control it.

Solar Cluster: History repeats: another PSU fan bites the dust

Perhaps literally… it has bitten the dust.  Although I wouldn’t call its installed location, dusty.  Once again, the fan in the mains power supply has carked it.

Long-term followers of this project may remember that the last PSU failed the same way.

The reason has me miffed.  All I did with the replacement, was take the PSU out of its box, loosen the two nuts for the terminals, slip the ring lugs for my power lead over the terminals, returned the nuts, plugged it in and turned it on.

While it is running 24×7, there is nothing in the documentation to say this PSU can’t run that way.  This is what the installation looks like.

If it were dusty, I’d expect to be seeing hardware failures in my nodes.

This PSU is barely 4 months old, and earlier this week, the fan started making noises, and requiring percussive maintenance to get started. Tonight, it failed. Completely, no taps on the case will convince it to go.

Now, I need to keep things running until the weekend. I need it to run without burning the house down.

Many moons ago, my father bought a 12V fan for the caravan. Cheap and nasty. It has a slider switch to select between two speeds; “fast” and “slow”, which would be better named “scream like a banshee” and “scream slightly less like a banshee”. The speed reduction is achieved by passing current through a 10W resistor, and achieves maybe a 2% reduction in motor RPM. As you can gather, it proved to be a rather unwelcome room mate, and has seen its last day in the caravan.

This fan, given it runs off 12V, has proven quite handy with the cluster. I’ve got my SB-50 “load” socket hanging out the front of the cluster. A little adaptor to bring that out to a cigarette lighter socket, and I can run it off the cluster batteries. When a build job has gotten a node hot and bothered, sitting this down the bottom of the cluster and aiming it at a node has cooled things down well.

Tonight, it has another task … to try and suck the hot air out of the PSU.

That’s the offending power supply.  A PowerTech MP-3089.  It powers the RedARC BCDC-1225 right above it.  And you can see my kludge around the cooling problem.  Not great, but it should hold for the next 24 hours.

Tomorrow, I think we’ll call past Aspley and pick up another replacement.  I’m leery of another now, but I literally have no choice … I need it now.  Sadly, >250W 12V switchmode PSUs are somewhat rare beasts here in Brisbane.  Altronics don’t sell them that big.  The grinning glasses are no more, and I’m not risking it with the Xantrex charger again.

Long term, I’m already looking at the MeanWell SP-480-12.  This is a PSU module, and will need its own case and mains wiring… but I have no faith in the MP-3089 to not fail and cremate my home of 34 years.

The nice feature of the SP-480-12 is that it does have a remote +12V power-off feature.  Presumably I can drive this with a comparator/output MOSFET, so that when the battery voltage drops below some critical threshold, it kicks in, and when it rises above a high set-point, it drops out.  Simple control, with no MCU involved.  I don’t see a reason to get more fancy than that on the control side, anything more is a liability.

On other news, my gcc build on the TS-7670 failed … so much for the wait.  We’ll try another version and see how we go.

Solar Cluster: Resuming a build

So the house got momentarily power-cycled this morning… I’m at work, minding my own business, next thing the access point emails me this:

Mar 13 09:04:23 Syslogd start up

Now, it only does that for two reasons.  Either someone told it to reboot (not I), or it got hard reset.  Sure enough, log into the old server, and it’s reporting an uptime of 15 minutes.  I get home this evening, and clocks all around are on the blink … literally.

The cluster course is going, power outage?  What power outage?

I did consider wiring up the ADSL modem, router, study switch, and the TS-7670 up to the cluster’s power rails, but haven’t gotten around to doing that.  Alas, I’m not quite there yet.

In any case, even if the TS-7670 had been powered from the solar, I’d have still have temporarily lost the build as the HDD dock I have the hard drive sitting in is mains powered.  It also doesn’t remember its state after a power cycling.  I’d have re-started the build from work, but the HDD remained off when the power came back on.

Never mind.  The downside is now I get to re-start a multi-day build.  The good news though, is that knowing the ebuild file that Portage picked out for compiling gcc; I can resume where it left off.  In this case, it’s using an ebuild from the musl overlay; /root/musl/sys-devel/gcc/gcc-6.4.0-r1.ebuild.

ebuild /root/musl/sys-devel/gcc/gcc-6.4.0-r1.ebuild package will preserve the current working tree and will resume where it was, hopefully without incident.  I’ll be left with a .tbz2; which will be picked up when I run emerge –keep-going -ekv @system.

Solar Cluster: Bootstrapping Gentoo

Well, in my last post I discussed getting OpenADK to build a dev environment on the TS-7670.  I had gotten Gentoo’s Portage installed, and started building packages.

The original plan was to build everything into /tmp/seed, but that requires that all the dependencies are present in the chroot.  They aren’t.  In the end, I decided to go the ill-advised route of compiling Gentoo over the top of OpenADK.

This is an ugly way to do things, but it so far is bearing fruit.  Initially there were some hiccups, and I had to restore some binaries from my OpenADK build tree.  When Gentoo installed python-exec; that broke Portage and I found I had to unpack a Python 2.7 binary I had built earlier then use that to re-install Portage.  I could then continue.

Right now, it’s grinding away at gcc; which was my nemesis from the beginning.  This time though, it successfully built xgcc and xg++; which means it has compiled itself using the OpenADK-supplied gcc; and now is building itself using its self-built binaries.  I think it does two or three passes at this.

If it gets through this, there’s about 65 packages to go after that.  Mostly small ones.  I should be able to do a ROOT=/tmp/seed emerge -ek @system then tar up /tmp/seed and emerge catalyst.  I have some wrapper scripts around Catalyst that I developed back when I was responsible for doing the MIPS stages.  These have been tweaked to do musl builds, and were used to produce these x86 stages.  The same will work for ARMv5.

It might be another week of grinding away, but we should get there. 🙂

Solar Cluster: arm-unknown-linux-musleabi… saga part III

So, after a longish wait… my laptop finally coughed up an image with a C/C++ compiler and almost all the bits necessary to make Gentoo Portage tick.

Almost everything… wget built, but it segfaults on start-up.  No matter, it seems curl works.  We do have an issue though: Portage no longer supports customising the downloader like it used to, or at least I couldn’t see how to do it, it used to be settings in make.conf.

Thankfully, I know shell scripts, and can make my own wget using the working curl:

bash-4.4# cat > /usr/bin/wget
#!/bin/bash

OUT=
while [ $# -gt 0 ]; do
    case "$1" in
        -O) OUT="$2"; shift;;
        -t) shift;;
        -T) shift;;
        --passive-ftp) : ;;
        *) break ;;
    esac
    shift
done

set -ex
curl --progress-bar -o "${OUT}" "$1"

Okay, it’s a little (a lot) braindead, but it beats downloading the lot by hand!

I was able to get Gentoo installed by hand using these instructions.  I have an old 1TB HDD plugged into a USB dock, formatted with a 10GB swap partition and the rest btrfs.  Sure, it’s only USB 2.0, but I’d sooner just put up with some CPU overhead than wear out my eMMC.

Next step; ROOT=/tmp/seed emerge -ev system

Modern web design, WTF is going on?

So, over the last few years we’ve seen a big shift in the way websites operate.

Once upon a time, JavaScript was a nice-to-have, and you as a web developer better be prepared for it to not be functional; the DOM was non-existent, and we were ooohing and ahhing over the de facto standard in Internet multimedia; MacroMedia Flash.  The engine we now call WebKit was still a primitive and quite basic renderer called KHTML in a little-known browser called Konqueror.  Mozilla didn’t exist as an open-source project yet; it was Netscape and Microsoft duelling it out together.

Back then, XMLHTTPRequest was so new, it wasn’t a standard yet; Microsoft had implemented the idea as an ActiveX control in IE5, no one else had it yet.  So if you wanted to update a page, you had to re-load the whole lot and render it server-side.  We had just shaken off our FONT tags for CSS (thank god!), but if you wanted to make an image change as the mouse cursor hovered over it, you still needed those onmouseover/onmouseout event handlers to swap the image.  Ohh, and scalable graphics?  Forget it.  Render as a GIF or JPEG and hope you picked the resolution right.

And bear in mind, the expectation was that, a user running an 800×600 pixel screen resolution, and connected via a 28.8kbps dial-up modem, should be able to load your page up within about 30 seconds, and navigate without needing to resort to horizontal scroll bars.  That meant images had to be compressed to be no bigger than 30kB.

That was 17 years ago.  Man I feel old!

This gets me thinking… today, the expectation is that your Internet connection is at least 256kbps.  Why then do websites take so long to load?

It seems our modern web designers have forgotten the art of how to pack down a website to minimise the amount of data needed to be transmitted so that the page is functional.  In this modern age of “pretty” web design, we’ve forgotten how to make a page practical.

Today, if you want to show an icon on a page, and have it fill the entire browser window, you can fire up Inkscape or Adobe Illustrator, let the creative juices flow and voilá, out pops a scalable vector graphic, which can be dropped straight into your HTML.  Turn on gzip compression on the web server, and that graphic will be on that 28.8kbps user’s screen in under 3 seconds, and can still be as big as they want.

If you want to make a page interactive, there’s no need to reload the entire page; XMLHTTPRequest is now a W3C standard, and implemented in all the major browsers.  Websockets means an end to any kind of polling; you can get updates as they happen.

It seems silly, but in spite of all the advancements, website page loads are not getting faster, they’re getting slower.  The “everybody has broadband” and “everybody has full-HD screens” argument is being used as an excuse for bloat and sloppy design practices.

More than once I’ve had to point someone to the horizontal scroll bar because the web designer failed to test their website at the rather common 1366×768 screen resolution of a typical laptop.  If I had a dollar for every time that’s happened in the last 12 months, I’d be able to buy the offending companies out and sack the web designers responsible!

One of the most annoying, from a security perspective, is the proliferation of “content distribution networks”.  It seems they’ve realised these big bulky blobs of JavaScript take a long time to load even on fast links.  So, what do the bright sparks do?  “I know… instead of loading it from one server, I’ll put it on 10 and increase my upload capacity 10-fold!”  Yes, they might have 1Gbps on each host.  1Gbps × 10 = 10Gbps, so the page will load at 10Gbps, right?

Cue sad tuba sound effect.

At my workplace, we have a 20Mbps Ethernet (not ADSL[2], fibre or cable; Ethernet) link to the Internet.  On that link, I’ve been watching the web get slower and slower… and I do not think our ISP is completely to blame, as I see the same issue at home too.  One where we feel the pain a lot, is Atlassian’s system, particularly Jira and Confluence.  To give you how bad they drink the CDN cool-aid, check out the number of sites I have to whitelist in order to get the page functional:

Atlassian’s JIRA… failing in spite of a crapton of scripts being loaded.

That’s 17 different hosts my web browser must make contact with, and download content from, before the page will function.  17 separate HTTP connections, which must fight with all other IP traffic on that 20Mbps Ethernet link for bandwidth.  20Mbps is the maximum that any one connection will do, and I can guarantee it will not reach even half that!

Interestingly, despite allowing all those scripts to load, they still failed to come up with the goods after a pregnant pause.  So the extra trashing of the link was for naught.  Then there’s the security implications.

At least 3 of those, are pages that Atlassian do not control.  If someone compromised ravenjs.com for example; they could inject any JavaScript they want on the JIRA site, and take control of a user’s account.  Atlassian are relying on these third partys’ promises and security practices, to ensure their site stays secure, and stays in their (third party’s) control.  Suppose someone forgets to renew the domain subscription, the result could be highly embarrassing!

So, I’m left wondering what they teach these days.  For a multitude of reasons, sites should be blazingly quick to load, partly because modern techniques ought to permit vastly improved efficiency of content representation and delivery; and that network link speeds are steadily improving.  However it seems the reverse is true… why are we failing so badly?

Going low and slow with bochs

So, today I had a problem… I needed to solve a race condition in a test case for my workplace’s WideSky system.  The test case was meant to ensure that, if the AMQP broker crashed or was restarted, it would re-connect and resume operations as quickly as possible.

On my desktop (an 8-core AMD Rysen 7), the test case always passed.  On the CI server (a VM running on a dual-core Core i3), it failed.  I figured the desktop here was running too quickly for me to see the problem.  I needed a machine that ran more like the CI server to see the problem.

Looking around, I couldn’t see any way to reliably slow down QEMU, KVM or VirtualBox… but I do remember one old project from the mid-late 90s that could: Bochs.

Bochs in action… emulating a P4 Prescott on a Rysen 7

Turns out, far from what it could do back in 1998 when it was strictly a 386 emulator (and a slow one at that!) it now has AMD64 emulation capabilities.  Thus, I can run the software stack inside this VM, and have it throttle the CPU speed down so that hopefully, the problem arises.  The first problem I needed to solve was trying to get the network running.  We have a PXE boot server which can serve up Ubuntu, no problem.  I just needed to bridge the Bochs VM onto the network somehow.

I already have bridge interfaces configured on my two physical network interfaces, and these work great with KVM.  Sadly, Bochs is rather primitive in what it supports… tap-mode networking just did not work, it complained that tap0 was not “running” even if created beforehand by iproute2, but I did find I could bind it directly to one of the enslaved network interfaces (enp36s0.200; yes, a VLAN interface).

e1000 worked for network booting, but then Ubuntu couldn’t retrieve an IP address for whatever reason. ne2k is working fine, and presently, I have the VM installing.  To make it network bootable, you need a boot ROM image, which you can download from the iPXE rom-o-matic service.  The magic PCI IDs you need are 10ec 8029 for ne2k, or (if it gets fixed) 8086 10de for e1000.

The following is my Bochs config file:

# configuration file generated by Bochs
plugin_ctrl: unmapped=1, biosdev=1, speaker=1, extfpuirq=1, parallel=1, serial=1, gameport=1, ne2k=1
config_interface: textconfig
display_library: x
debug: action=report
memory: host=2048, guest=2048
romimage: file="/usr/share/bochs/BIOS-bochs-latest", address=0x0, options=none
vgaromimage: file="/usr/share/bochs/VGABIOS-lgpl-latest"
boot: disk, network
floppy_bootsig_check: disabled=0
# no floppya
# no floppyb
ata0: enabled=1, ioaddr1=0x1f0, ioaddr2=0x3f0, irq=14
ata0-master: type=disk, path="/tmp/wstest.raw", mode=flat, cylinders=0, heads=0, spt=0, model="Generic 1234", biosdetect=auto, translation=auto
ata0-slave: type=none
ata1: enabled=1, ioaddr1=0x170, ioaddr2=0x370, irq=15
ata1-master: type=none
ata1-slave: type=none
ata2: enabled=0
ata3: enabled=0
optromimage1: file=none
optromimage2: file=none
optromimage3: file=none
optromimage4: file=none
optramimage1: file=none
optramimage2: file=none
optramimage3: file=none
optramimage4: file=none
pci: enabled=1, chipset=i440fx, slot1=ne2k, slot2=cirrus
vga: extension=cirrus, update_freq=5, realtime=1
cpu: count=1:1:1, ips=40000000, quantum=16, model=p4_prescott_celeron_336, reset_on_triple_fault=1, cpuid_limit_winnt=0, ignore_bad_msrs=1, mwait_is_nop=0
print_timestamps: enabled=0
port_e9_hack: enabled=0
private_colormap: enabled=0
clock: sync=none, time0=local, rtc_sync=0
# no cmosimage
# no loader
log: -
logprefix: %t%e%d
debug: action=ignore
info: action=report
error: action=report
panic: action=ask
keyboard: type=mf, serial_delay=250, paste_delay=100000, user_shortcut=none
mouse: type=ps2, enabled=0, toggle=ctrl+mbutton
speaker: enabled=1, mode=system
parport1: enabled=1, file=none
parport2: enabled=0
com1: enabled=1, mode=null
com2: enabled=0
com3: enabled=0
com4: enabled=0
ne2k: enabled=1, mac=fe:fd:de:ad:be:ef, ethmod=linux, ethdev=enp36s0.200, script=/bin/true, bootrom="/tmp/10ec8029.rom"

Create your hard drive image using qemu-img, then run bochs -f yourfile.cfg and it should, hopefully, work.