Computing

Batch-downloading artifacts from Atlassian Bamboo

Atlassian Bamboo is a CI/CD server intended for large organisations to run internal builds. One thing that the API designers forgot with this system was a way to enumerate and download artifacts from the CI server. Not easily anyway.

The Bamboo server has a reasonable REST API that does about 60% of the work for us, it’ll let us query what the last successful build was, and give us the metadata details of that, as JSON, which we can pick through to get out what we need: mostly this is just the build number.

With that, we can hit up the artifacts endpoint and download the rest, but there’s two flies in the ointment:

While the REST API supports Bearer authentication using a personal access token, the artifacts endpoint does not: one must use HTTP basic authentication
To download directory trees, one must scrape the HTML and walk the tree

Thankfully (2) can be achieved using wget --mirror.

# Bamboo server
: ${BAMBOO_URI:=https://bamboo.example.net}

# Output directory
: ${OUTPUT_DIR:=/tmp/output}

# Plan keys (with project key prefix)
: ${PLAN_KEYS:=PK-PROJ01 PK-PROJ02}

if [ -z "${BAMBOO_USER}" ]; then
	read -p "Bamboo username: " BAMBOO_USER
fi
if [ -z "${BAMBOO_PASSWORD}" ]; then
	read -s -p "Bamboo password: " BAMBOO_PASSWORD
fi

bamboo_wget() {
	wget 	--http-user "${BAMBOO_USER}" \
		--http-password "${BAMBOO_PASSWORD}" \
		"${@}"
}

# Download the latest artifacts from the following projects
for plankey in ${PLAN_KEYS}; do
	bamboo_wget	--header "Accept: application/json" \
			-O "${OUTPUT_DIR}/${plankey}-result.json" \
		${BAMBOO_URI}/rest/api/latest/result/${plankey}?buildstate=Successful\&os_authType=basic

	buildno=$( jq -r .results.result[0].buildNumber \
			"${OUTPUT_DIR}/${plankey}-result.json" )

	bamboo_wget -P "${OUTPUT_DIR}/${plankey}" --mirror -nd -np \
		${BAMBOO_URI}/artifact/${plankey}/shared/build-${buildno}/?os_authType=basic
done

# ${OUTPUT_DIR} will accumulate some cruft from `wget --mirror` but all the files
# should be there.

Hopefully it won’t be much longer and I won’t have to deal with this house of horrors much more, but for others that are stuck with it, here’s something that may help.

Using image resources with Maven Java projects in Netbeans

Last Easter, I was running a checkpoint at Imbil as I’ve done before… operating a checkpoint at Derrier Hill Grid with horses passing through from three different events simultaneously, coming from two different directions, and getting more confused than a moth in a light shop. At that time I thought it’d be really handy to have a software program that could “sort ’em all out”. I punch in the competitor numbers, it tells me what division they’re in and records the time… I then assign the check-points and update the paperwork.

We have such a program, a VisualBASIC 6 application written by one of the other amateurs, however I use Linux. My current tablet, a Panasonic FZ-G1 Mk1, won’t run any supported version of Windows well (Windows 10 on 4GB RAM is agonisingly slow… and it goes out of support in October anyway), but otherwise would be an ideal workhorse for this, if I could write a program.

So I rolled up my sleeves, and wrote a checkpoint reporting application. Java was used because then it can be used on Windows too, as well as any Linux distribution with OpenJDK’s JRE. I wanted a single “distribution package” that would run on any system with the appropriate runtime, that way I wouldn’t need a build environment for each OS/architecture I wanted to support.

One thing that troubled me through this process… was getting image resources working. I used the Netbeans IDE to try and make it easier for others to contribute later on if desired: it has a GUI form builder that can help with all the GUI creation boilerplate, and helps keep the project structure more-or-less resembling a “standard” structure. (This is something that Python’s tkinter seriously lacks: a RAD tool for producing the UIs. The author of the aforementioned VB6 software calls it “T-stinker”, and I find it hard to disagree!)

Netbeans defaults to using the Maven build system for Java projects, although Ant and Gradle are both supported as well. (Not sure which one of the three is “preferred”, I know Android often use Gradle… thoughts Java people?) It also supports adding bitmap resources to a project for things like icons. I used some icons from the GTK+ v3 (LGPLv2) and Gnome Adwaita Legacy (CCBYSA3) projects.

The problem I faced, was actually using them in the UI. I was getting a NullPointerException every time I tried setting one, and Netbeans’ documentation was no help at all. It just wasn’t finding the .png files no matter what I did:

2025-06-15T06:32:54.461Z [FINEST] com.vk4msl.checkpointreporter.ui.ReporterForm: Choose nav tree node test
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
        at javax.swing.ImageIcon.<init>(ImageIcon.java:217)
        at com.vk4msl.checkpointreporter.ui.event.EventPanel.initComponents(EventPanel.java:239)
        at com.vk4msl.checkpointreporter.ui.event.EventPanel.<init>(EventPanel.java:63)
        at com.vk4msl.checkpointreporter.ui.ReporterForm.showEvent(ReporterForm.java:895)
        at com.vk4msl.checkpointreporter.CheckpointReporter.showEntity(CheckpointReporter.java:532)
        at com.vk4msl.checkpointreporter.ui.ReporterForm.navTreeValueChanged(ReporterForm.java:480)
        at com.vk4msl.checkpointreporter.ui.ReporterForm.access$100(ReporterForm.java:70)
        at com.vk4msl.checkpointreporter.ui.ReporterForm$2.valueChanged(ReporterForm.java:182)

Maybe it’s my search skills, or the degradation of search, but I could not put my finger on why it kept failing… the file was where it should be, the path in the code was correct according to the docs, why was it failing?

Turns out, when Maven does a build, it builds all the objects in a target/classes directory. When Netbeans runs your project, it does so out of that directory. Maven did not bother to copy the .png files across, because Netbeans never told it to.

I needed the following bit of code in my pom.xml file:

       <resources>
         <resource>
           <targetPath>com/vk4msl/checkpointreporter/ui/components/icons</targetPath>
           <directory>${project.basedir}/src/main/java/com/vk4msl/checkpointreporter/ui/components/icons</directory>
           <includes>
             <include>**/README.md</include>
             <include>**/*.png</include>
           </includes>
         </resource>
       </resources>

That tells Maven to pick up those .png files (in the com.vk4msl.checkpointreporter.ui.components.icons package) and put them, along with the README.md, in the staging directory for the application. Then Java would be able to find those resources, and they’d be in the .jar file in the right place.

Other suggestions have been to move the project to using Ant (which was the old way Java projects were built, but seems to be out of favour now?)… not sure if Gradle has this problem… maybe some people more familiar with Java’s build systems can comment. This is probably the most serious Java stuff I’ve done in the last 20 years.

I used Java because it produced a single platform-independent binary that could run anywhere with the appropriate runtime, and featured a runtime that had everything I needed in a format that was easy to pick back up. C# I’ve used for command-line applications at university, but I’ve never done anything with Windows Forms, so I’d have to learn that from scratch as well as wrestling MSBuild (yuck!). Python almost was it, but as I say, dealing with tkinter and trying to map that to all the TK docs out there that assume you’re using TCL, made it a nightmare to use. I didn’t want to bring in third-party libraries like Qt or wxWidgets as that’d complicate deployment, and other options like C++, Rust and Go all produce native binaries, meaning I’d have to compile for each platform. (Or force people to build it themselves.)

Java did the job nicely. Not the prettiest application, but the end result is I have a basic Java program, using the classical Swing UI interface that should be a big help at Southbrook later this month. I’ll probably build on this further, but this should go a big way to scratching the itch I had.

Blog news: user registrations now disabled

Today, I noticed I had an extremely large number of “new” users, about 206 accounts in total created, with an odd username pattern:

www.XXXXX.blogspot.YY - Z.ZZZ BINANCE

XXXXX was a randomly chosen set of letters, YY was one of many TLDs that Blogspot uses, and Z.ZZZ was some kind of price value. Clearly, spammers have found a new way to send spam: using the registration email that WordPress uses to confirm your email address is valid.

This blog has always allowed comments in one form or another. Originally I allowed anonymous user comments and ping-backs, but when those got abused, I disabled them, requiring that a user be registered with the site to comment.

That has been fine up until today. Some “BINANCE” arsehole decided that the username field was a perfect way to spray shite from one end of the Internet to the other. In total, 230 accounts were created in the past few hours, mostly to gmail.com email addresses.

Maybe the username field can be stripped on such emails so the only thing they can supply is the email address (basically making sending mass spam this way very difficult; they’d have to “encode” it in a sub-address … not all providers support this and they can do it different ways).

We’ll see what damage that’s done to the sender score on this site. I have a second route I can use for sending outbound email that’s got a clean reputation so not all is lost.

Comments for the time being will now be exclusively through ActivityPub.

2025/05/31 by Redhatter (VK4MSL) Computing Public Service Announcements Public Syndication Rants 0

lvmcache fun and games

This blog has never been on what I’d call, a high-performance server. In fact, things are a little on the slow side. I try to be frugal with my system resource allocation, with the assumption that my little site does not get a lot of traffic (much less since it’s no longer syndicated on Gentoo Planet). However, I think I managed to get the performance up a notch…

The site runs on my solar powered server cluster, with a couple of Ceph RBDs, one for the root OS and one for the data (MariaDB / www root / /home). The VM runs AlpineLinux. The VM host was over-provisioned with a larger SSD than required, allowing me to dedicate some space for local cache.

I had thought I could set something up that would organise the cache on the VM host, and abstract it from the VM, but so far, I’ve not gotten around to doing that. (I did have something sort-of working in OpenNebula with flashcache at work, but it was flaky.)

In libvirt, I provisioned a new RBD to serve as the backing store (thus keeping a pristine copy to roll back to should things go pear shaped), and a new LVM volume for the cache. For the time being, I moved the existing volume to be the last device. So I had:

/dev/vda: OS
/dev/vdb: Data volume
/dev/vdc: Cache volume
/dev/vdd: temporary Old /dev/vdb for data migration

Failed approaches

Firstly, what didn’t work for me, was bcachefs and bcache.

`bcachefs`

bcachefs wanted to fight me every step of the way, making formatting the volumes difficult with sketchy documentation (especially as I wanted a write-through cache to facilitate VM migration).

bcachefs format gives some very cryptic error messages, and has a somewhat quirky argument syntax for formatting. The command I figured out through trial-and-error was this:

bcachefs format \
    --replicas=1 \
    --durability=0 /dev/vdc1 \
    --durability=1 /dev/vdb1 \
    --foreground_target /dev/vdc1 \
    --promote_target /dev/vdc1 \
    --background_target /dev/vdb1 \
    --metadata_target /dev/vdc1

The problem was convincing mount to actually mount it. I was supposed to specify every device, but each time it flatly refused, no matter what order I used, it told me “no such device”.

`bcache`

This is the underlying caching logic that bcachefs was built on, so I figured I’d try that. This worked better, but I found AlpineLinux had no real knowledge of bcache, and thus did not provide any means for me to bring up /dev/bcache0 before localmount mounted it.

I could have written a OpenRC init script to do this, but I wasn’t certain about this path, so decided to put the idea aside.

Winning approach: `lvmcache`

Luckily lvm2 has a built-in method: lvmcache. After installing the lvm2 package in AlpineLinux, I blatted the partition tables on my two virtual disks, formatted them as LVM physical volumes, and added them to a volume group.

~ # pvcreate /dev/vdb /dev/vdc
  Physical volume "/dev/vdb" successfully created.
  Physical volume "/dev/vdc" successfully created.
~ # vgcreate data /dev/vdb
  Volume group "data" successfully created
~ # vgextend data /dev/vdc 
  Volume group "data" successfully extended

Now to create the logical volumes, first… I created the volumes themselves. This wound up being a little tricky because I wanted to use all the available space on each volume… I had tried specifying -L ${SZ}G but this ignored the fact that LVM uses a bit of header space on each physical volume. It complained, but in doing so, told me the size in extents that was available, so I was able to use -l ${SZ} to specify that number of extents:

~ # lvcreate --size 8G --name datavol data /dev/vdb
 Insufficient free space: 2048 extents needed, but only 2047 available
~ # lvcreate -l 2047 --name datavol data /dev/vdb  
 Logical volume "datavol" created.
~ # lvcreate -n cachevol -l 4095 data /dev/vdc
 Volume group "data" has insufficient free space (1023 extents): 4095 required.
~ # lvcreate -n cachevol -l 1023 data /dev/vdc
 Logical volume "cachevol" created.

Now I had two separate LVM volumes, one on each physical device. Now to link them:

~ # lvconvert --type cache --cachevol cachevol data/datavol
Erase all existing data on data/cachevol? [y/n]: y
 Logical volume data/datavol is now cached.

Great, except I forgot to specify the write mode. Turns out, this is a lvchange away:

~ # lvchange --cachemode writethrough /dev/data/datavol  
 Logical volume data/datavol changed.

I could now format /dev/data/datavol with a filesystem, and migrate the data across. rsync here we come. An update to /etc/fstab and we were in business.

So far, things seem to be more snappy, so we’ll keep an eye on things. It’s survived a couple of reboots, the question is what happens when I boost a post on Mastodon, does all the ActivityPub instances out there cause problems? Guess I’ll find out in a moment.

2024/09/28 by Redhatter (VK4MSL) Computing Projects Public Syndication Solar-powered Cloud Computing 0

New laptop: StarBook Mk VI

I rarely replace computers… I’ll replace something when it is no longer able to perform its usual duty, or if I feel it might decide to abruptly resign anyway. For the last 10 years, I’ve been running a Panasonic CF-53 MkII as my workhorse, and it continues to be a reliable machine.

I just replaced the battery in it, so I now have two batteries, the original which now has about 1.5-2 hours of capacity, and a new one which gives me about 6 hours. A nice thing about that particular machine is it still implements legacy interfaces like RS-232 and Cardbus/PCMCIA. I’ve upgraded the internal storage to a 2TB SSD and replaced the DVD burner with a Blu-ray burner. There is one thing though it does lack which didn’t matter much prior to 2020: an internal microphone. I can plug a headset in, and that works okay for joining in on work meetings, but if there’s a group of us, that doesn’t work so well.

The machine is also a hefty lump to lug around due to being a “semi-rugged”. There’s also no webcam, not a deal breaker, but again, a reflection of how we communicate in 2023 vs what was typical in 2013.

Given I figured it “didn’t owe me anything”… it was time to look at a replacement and get that up and running before the old faithful decided to quit working and leave me stranded. I wanted something designed for open-source software ground-up this time around. The Panasonic worked great for that because it was quite conservative on specs — despite being purchased new in 2013, it sported an Intel IvyBridge-class Core i5, whereas the latest and greatest was the Haswell generation. Linux worked well, and still does, but it did so because of conservatism rather than explicit design.

Enter the StarBook Mk VI. This machine was built for Linux first and foremost. Windows is an option, that you pay extra for on this system. You also can choose your preferred CPU option, and even choose your preferred boot firmware, with AMI UEFI and coreboot (*Intel models only for now) available.

Figuring, I’ll probably be using this for the better part of 10 years from now… I aimed for the stars:

CPU: AMD Ryzen 7 5800U 8-core CPU with hyperthreading
RAM: 64GiB DDR4
SSD: 1.8TB NVMe
Boot firmware: coreboot
OS: Ubuntu 22.04 LTS (used to test the machine then install Gentoo)
Keyboard Layout: US
Power adapter: AU with 2m USB-C cable

         -/oyddmdhs+:.                stuartl@vk4msl-sb 
     -odNMMMMMMMMNNmhy+-`             ----------------- 
   -yNMMMMMMMMMMMNNNmmdhy+-           OS: Gentoo Linux x86_64 
 `omMMMMMMMMMMMMNmdmmmmddhhy/`        Host: StarBook Version 1.0 
 omMMMMMMMMMMMNhhyyyohmdddhhhdo`      Kernel: 6.5.7-vk4msl-sb-… 
.ydMMMMMMMMMMdhs++so/smdddhhhhdm+`    Uptime: 1 hour, 15 mins 
 oyhdmNMMMMMMMNdyooydmddddhhhhyhNd.   Packages: 2497 (emerge) 
  :oyhhdNNMMMMMMMNNNmmdddhhhhhyymMh   Shell: bash 5.1.16 
    .:+sydNMMMMMNNNmmmdddhhhhhhmMmy   Resolution: 1920x1080 
       /mMMMMMMNNNmmmdddhhhhhmMNhs:   WM: fvwm3 
    `oNMMMMMMMNNNmmmddddhhdmMNhs+`    Theme: Adwaita [GTK2/3] 
  `sNMMMMMMMMNNNmmmdddddmNMmhs/.      Icons: oxygen [GTK2/3] 
 /NMMMMMMMMNNNNmmmdddmNMNdso:`        Terminal: konsole 
+MMMMMMMNNNNNmmmmdmNMNdso/-           Terminal Font: Terminus (TTF) 16 
yMMNNNNNNNmmmmmNNMmhs+/-`             CPU: AMD Ryzen 7 5800U (16) @ 4.507GHz 
/hMMNNNNNNNNMNdhs++/-`                GPU: AMD ATI Radeon Vega Series / Radeon Vega Mobile Series 
`/ohdmmddhys+++/:.`                   Memory: 4685MiB / 63703MiB 
  `-//////:--.

First impressions

The machine arrived on Thursday, and I’ve spent much of the last few days setting it up. I first checked it out with the stock Ubuntu install: the machine boots up into an installer of sorts, which is good as it means you set up the user account yourself — there’s no credentials loose in the box. Downside is you don’t get to pick the partition layout.

The machine, despite being ordered with coreboot boot firmware, actually arrived with AMI boot firmware instead. Apparently the port of coreboot for AMD systems is still under active development, and I’m told there will be a guide published describing the procedure for installing coreboot. Minor irritation, I was looking forward to trying out coreboot on this machine — but not a show-stopper… I look forward to trying the guide when it becomes available.

The machine itself felt quite zippy… but then again, when you’re used to a ~12-year-old CPU, 8GB RAM and a 2TB SATA-II SSD for storage, it isn’t much of a surprise that the performance would be a big jump.

Installing Gentoo

After trying the machine out, I booted up a SysRescueCD USB stick and used gparted to shove-over the Ubuntu install into the last 32GiB of the partition, then proceeded to create a set of partitions for Gentoo’s root, a 80GiB swap partition (seems a lot, but it’s 64GiB for suspend-to-disk plus 16GiB for contingencies) some space for a /home partition, some LVM space for VMs, and my Ubuntu install right at the end.

I booted back into Ubuntu, and used it as my environment for bootstrapping Gentoo, that way I could experience how the machine behaved under a heavy load. Firefox was, not bad, under the circumstances. My only gripe being the tug-o-war between Ubuntu insisting that I use their Snap package, and me preferring a native install due to the former’s inability to respect my existing profile settings. This is a weekly battle I have with the office workstation.

In discussing with Starlabs Systems, they mentioned two possible gremlins to watch out for, WiFi (important since this machine has no Ethernet) and the touch pad.

I used a self-built Gentoo stage 3, unwittingly I used one built against the still-experimental 23.0 profiles, which meant it used a merged /usr base layout… but we’ll see how that goes anyway… since it’s the direction that Debian and others are going anyway. So far, the only issue has been the inability to install openrc and minicom together since both install a runscript binary in the same place.

Once I had enough installed to be able to boot the Gentoo install, including building a kernel, I got the boot-loader installed, re-configured UEFI to boot that in preference to Ubuntu, then booted the new OS.

First boot under Gentoo

OS boot-up was near instantaneous. I’m used to about 10-15 seconds spent, but this took no time at all.

WiFi worked out-of-the-box with kernel 6.5.7, but the touch pad was not detected. Actually, under X11 the keyboard was unresponsive too because I forgot to install the various drivers for X.org. Oops! I sorted out the drivers easy enough, but the touch pad was still an issue.

Troubleshooting the touch pad

To get the touch pad working, I ended up taking the Ubuntu kernel config, setting NVMe and btrfs to being built-in, and re-built the whole thing again… took a long time, but success, I had the touch pad working.

The tricky bit is the touch pad is a I²C device connected via the AMD chipset, and described in the ACPI. Not quite sure how this will work under coreboot, but we’ll cross that bridge later. I spent a little time today refining the kernel down a little from the everything kernel that Ubuntu use… to something a little more specific. Notably, things you can’t directly plug into this machine (like ISA/PCI/PCIe cards, CardBus/PCMCIA, etc) or interfaces the machine did not have (e.g. floppy drive, 8250 serial), I took out. Things that could conceivably be plugged in like USB devices were left in.

It took several tries, but I got something that’s workable for this hardware in the end.

Final kernel configuration

The end result is this kernel config. Intel StarBook users might be better off starting with the Ubuntu kernel config like I did and pare it back, but that file may still give you some clues.

Thoughts

Whilst compiling, this machine does not muck around… being a 8-core SMT machine it actually builds things quite rapidly, although on this occasion I gave the machine a helping hand on some bigger packages like Chromium by using a pre-built binary built for my other machines.

Everything I use seems to work just fine under Gentoo… and of course, having copied my /home from the Panasonic (you never realise how much crap you’ve got until you move house!), it was just a little tweaking of settings to suit the newer software installed.

I’m yet to try it out a full day running on the battery to see how that fares. Going flat-chat doing builds it only lasted about 2 hours, but that’s to be expected when you’ve got the machine under a heavy load.

Zoom sees the webcam and can pick up the microphone just fine. I expect Slack will too, but I’ll find that out when I return to work (ugh!) in a fortnight.

My only gripe right now is that my right pinkie finger keeps finding the SysRq/PrintScreen button when moving around with the arrow keys… been used to that arrow cluster being on the far-right of the keyboard not one row back like this one. Other than that, the keyboard seems reasonable for typing on. The touch pad not being recessed sometimes picks up stray movements when typing, but one can disable/enable it pretty easily via Fn+F10 (yes, I have Fn-lock enabled). The keyboard backlight is a nice touch too.

The lack of an Ethernet port is my other gripe, but not hard to work-around, I have a USB-C “dock” that I bought to use with my tablet that gives me 3×USB-3A, full-size SD, microSD, 2×HDMI, Ethernet and audio out and pass-through USB-C for charging. The Ethernet port on that works and the laptop happily charges through it, so that works well enough.

The power supply for this thing is tiny, 65W with USB-A and USB-C ports. I also tried charging this laptop with a conventional USB-A charger but it did not want to know (the PSU probably doesn’t do USB PD). Should be possible to find a 12V-powered USB-C charger that will work though.

The Toughbook will likely be my go-to on camping trips and WICEN events, despite being a heavier and bigger unit, as usually I’m not lugging the thing around, it’s better ruggedised for outdoor activities, and it’s also looks about 10 years older than it really is, so not attractive to steal.

2023/10/21 by Redhatter (VK4MSL) Computing Open Source Public Syndication 0

Leave at last

So… I’ve been busy at work lately, and that for the last few months has been my primary focus. A big reason why I’ve been keeping my head low is because a few years ago, it was pointed out that I had been physically with the company I’m working for for about 10 years.

Here in Australia, that milestone grants long-service leave; a bonus 8⅔ weeks of leave. This is something that’s automatic for full-time employees of a company, and harks back to the days of when people used to travel to Australia from England by ship to work, this gave that person an opportunity to travel back and visit family “back home”.

But at the time, I wasn’t there yet! See, for the first few years I was a contractor, so not a full-time employee. I didn’t become full-time until 2013, meaning my 10 years would not tick up until this year.

While the milestone is 99% symbolic, the thing is at my age (nearing 40), I’m unlikely to ever see that milestone come up again. If I did something that blew it or put it in jeopardy in any way, it’d be up in smoke.

There are some select cases where such leave may be granted early (between 7-10 years):

if the person dies, suffers total physical disability or serious illness
the person’s position becomes untenable
the person’s domestic situation forces them to leave (e.g. dropping out of work to become a carer for a family member)
the employer dismisses the person for reasons other than that person’s performance, conduct or capacity
unfair dismissal

I thought, it was worth sticking it out… after 10 years, it’s a done deal, the person is entitled to the full amount. If they booted me out after that, they’d still have to pay out that, plus the holiday leave (which I still have lots because I haven’t taken much since 2018).

Employment plans

Right now, I’m not going anywhere, I’ve got nowhere to go anyway. While doing work on things like electricity billing brings me no joy whatsoever (“bill shock as-a-service” is what it feels like), it pays the bills, and I’m not quite at the point where I can safely let it all go and coast into an early retirement.

Work has actually changed a lot in the past few years. Years ago, we did a lot of Python work, I also did some work in C. Today, it’s lots of JavaScript, which is an idiosyncratic language to say the least, and don’t get me started on the moving target that is UI frameworks for web pages!

Dealing with the disaster that is Office365 (which is threatening to invade even more into my work) doesn’t make this any easier, but sadly that piece of malware has infected most organisations now. (Just do a dig -t MX <employer-domain>, or just look at the headers of an email from their employees, many show Office365 today). I’ve so far dodged Microsoft Teams which I now flatly refuse to use as I do not consent to my likeness/voice being used in AI models and Microsoft isn’t being open about how it uses AI.

Most people my age get shepherded into management positions, really not my scene. In a new job I’d be competing with 20-somethings that are more familiar with the current software stacks. Non-technical jobs exist, but many assume you own a motor vehicle and possess the requisite license to operate it.

This pretty much means I’m unemployable if I leave or are booted out, so whatever I have in the bank balance needs to make it through until my time on this planet is done.

Thus, I must stick it out a bit longer… I might not get the 15-year bonus (4⅓ weeks), but at least I can’t lose what I have now. If excrement does meet a rotary cooling device though, simulations suggest with some creative accounting, I may just scrape through. I don’t plan on setting up a donations page and talking to Centrelink is a waste of time, I’ll die a pauper before they answer the phone.

Plans for this month

So I have holiday leave off until November. Unlike previous times I’ve taken big amounts off, I won’t be travelling far this time around. Instead, it’s a project month.

Financial work

I need to plan ahead for the possibility that I wind up in long-term unemployment. I don’t expect to live long (the planet cannot sustain everyone living to >100 years), but I do need to be around to finalise the estates of my parents and see my cat out.

That suggests I need to keep the lights on for another 20~30 years. Presently my annual expenditure has been around the $30k mark, but much of that is discretionary (most of it has been on music), and I could possibly reduce that to around the $10k mark.

I have some shares, but need to expand on this further. David Rowe posted some ideas in a two part series which provides some food for thought here.

At the moment, I’m nowhere near that 10% yield figure mentioned…that post was written in 2015 and lot has changed in 8 years. Interest rates are currently at ~5% for term deposits.

I do plan to start one though all the same. After Suncorp closed both The Gap and Ashgrove branches (forcing me all the way to Michelton), I set up an account at BOQ who have branches in both Ashgrove and The Gap… so I can do a term deposit with either, and they’re both offering a 5% 12-month term deposit.

I have a year’s worth sitting at BOQ in an interest bearing account… so that’s money that’s readily accessible. The remainder I have, I plan to split — some going into the aforementioned term deposit, the other will go into that interest bearing account in case I decide to buy more shares.

That should start building the reserves up.

Hardware refurbishment and replacement

Some of my equipment is getting a bit long in the tooth. The old desktop I bought back in 2010 is playing silly-buggers at the moment, and even the laptop I’m typing this on is nearing 10 years old. I have one desktop which used to be my office workstation before the pandemic, so between it and the old desktop, I have decent processing capacity.

The server rack needs work though. One compute node is down, and I’m actually running out of space. I also need to greatly expand the battery bank. I bought a full-height open-frame rack to replace the old one, and was gifted a new solar controller, so some time during this break, I’ll be assembling that, moving the old servers into it… and getting the replacement compute node up and running.

Software updates

I’ve been doing this to critical servers… I recently replaced the mail server with a new VM instance which made the maintenance work-load a lot lower… but there’s still some machines that need my attention.

I’m already working on getting my Mastodon instance up to release 4.2.0 (I bumped it to 4.1.9 to at least get a security patch off my back), there are a couple of OpenBSD routers that need updates and some similar remedial work.

Projects

Already mentioned is the server cluster (hardware and software), but there are some other projects that need my attention.

setuptools-pyproject-migration is a project that David Zaslavsky and I have been working on that is intended to help projects migrate from the old setup.py scripts in Python projects to the new pyproject.toml structure. Work has kept me busy, but the project is nearly ready for the first release. I need to help finish up the bits that are missing, and get that out there.
aioax25 could use some love, connected mode nearly works, plus it could do with a modernisation.
Brisbane WICEN‘s RFID tracking project is something I have not posted much about, but nonetheless got a lot of attention at the Tom Quilty this year, this needs further work.

Self-Training

Some things I’d like to try and get my head around, if possible…

Work uses NodeJS for a lot of things, but we’re butting up against its limits a lot. We use a lot of projects that are written in GoLang (e.g. InfluxDB, Grafana, Terraform, Vault), and while I did manage to hack some features into s3sync needed for work, I should get to know GoLang properly.
Rust interests me a lot. I should at least have a closer look at this and learn a little. It has been getting a mention around the office in the context of writing NodeJS extensions. Definitely worth looking into further.
I need to properly get to understand OAuth2, as I don’t think I completely understand it as it stands now. I’m not sure I’m doing it “right”.
COSE would have applications in both the WideSky Hub (end-to-end encryption) and in Brisbane WICEN’s RFID tracking system (digital signatures).

Physical exercise

I have not been out on the bike for some time, and it shows! I need to get out more. I intend to do quite a bit of that over the next few weeks.

Maybe I might do the odd over-nighter, but we’ll see.

2023/09/30 by Redhatter (VK4MSL) Computing Projects Public Syndication Solar-powered Cloud Computing 0

Generating ball tickets/programmes using LaTeX

My father does a lot of Scottish Country dancing, he was a treasurer for the Clan MacKenzie association for quite a while, and a president there for about 10 years too. He was given a task for making some ball tickets, but each one being uniquely numbered.

After hearing him swear at LibreOffice for a bit, then at Avery’s label making software, I decided to take matters into my own hands.

First step was to come up with a template. The programs were to be A6-size booklets; made up of A5 pages folded in half. For ease of manufacture, they would be printed two to a page on A4 pages.

The first step was to come up with the template that would serve as the outer and inner pages. The outer page would have a placeholder that we’d substitute.

The outer pages of the programme/ticket booklet… yes there is a typo in the last line of the “back” page.

\documentclass[a5paper,landscape,16pt]{minimal}
\usepackage{multicol}
\setlength{\columnsep}{0cm}
\usepackage[top=1cm, left=0cm, right=0cm, bottom=1cm]{geometry}
\linespread{2}
\begin{document}
\begin{multicols}{2}[]

\vspace*{1cm}

\begin{center}
\begin{em}
We thank you for your company today\linebreak
and helping to celebrate 50 years of friendship\linebreak
fun and learning in the Redlands.
\end{em}
\end{center}

\begin{center}
\begin{em}
May the road rise to greet you,\linebreak
may the wind always be at your back,\linebreak
may the sun shine warm upon your face,\linebreak
the rains fall soft upon your fields\linebreak
and until we meet again,\linebreak
may God gold you in the palm of his hand.
\end{em}
\end{center}

\vspace*{1cm}

\columnbreak
\begin{center}
\begin{em}
\textbf{CLEVELAND SCOTTISH COUNTRY DANCERS\linebreak
50th GOLD n' TARTAN ANNIVERSARY TEA DANCE}\linebreak
\linebreak
1973 - 2023\linebreak
Saturday 20th May 2023\linebreak
1.00pm for 1.30pm - 5pm\linebreak
Redlands Memorial Hall\linebreak
South Street\linebreak
Cleveland\linebreak
\end{em}
\end{center}

\begin{center}
\begin{em}
Live Music by Emma Nixon \& Iain Mckenzie\linebreak
Black Bear Duo
\end{em}
\end{center}

\vspace{1cm}

\begin{center}
\begin{em}
Cost \$25 per person, non-dancer \$15\linebreak
\textbf{Ticket No \${NUM}}
\end{em}
\end{center}
\end{multicols}
\end{document}

The inner pages were the same for all booklets, so we just came up with one file that was used for all. I won’t put the code here, but suffice to say, it was similar to the above.

The inner pages, no placeholders needed here.

So we had two files; ticket-outer.tex and ticket-inner.tex. What next? Well, we needed to make 100 versions of ticket-outer.tex, each with a different number substituted for $NUM, and rendered as PDF. Similarly, we needed the inner pages rendered as a PDF (which we can do just once, since they’re all the same).

#!/bin/bash
NUM_TICKETS=100

set -ex

pdflatex ticket-inner.tex
for n in $( seq 1 ${NUM_TICKETS} ); do
	sed -e 's:\\\${NUM}:'${n}':' \
            < ticket-outer.tex \
            > ticket-outer-${n}.tex
	pdflatex ticket-outer-${n}.tex
done

This gives us a single ticket-outer.pdf, and 100 different ticket-inner-NN.pdf files that look like this:

A ticket outer pages document with substituted placeholder

Now, we just need to put everything together. The final document should have no margins, and should just import the relevant PDF files in-place. So naturally, we just script it; this time stepping every 2 tickets, so we can assemble the A4 PDF document with our A5 tickets: outer pages of the odd-numbered ticket, outer pages of the even-numbered ticket, followed by two copies of the inner pages. Repeat for all tickets. We also need to ensure that initial paragraph lines are not indented, so setting \parindent solves that.

This is the rest of my quick-and-dirty shell script:

cat > tickets.tex <<EOF
\documentclass[a4paper]{minimal}
\usepackage[top=0cm, left=0cm, right=0cm, bottom=0cm]{geometry}
\usepackage{pdfpages}
\setlength{\parindent}{0pt}
\begin{document}
EOF
for n in $( seq 1 2 ${NUM_TICKETS} ); do
	m=$(( ${n} + 1 ))
	cat >> tickets.tex <<EOF
\includegraphics[width=21cm]{ticket-outer-${n}.pdf}
\includegraphics[width=21cm]{ticket-outer-${m}.pdf}
\includegraphics[width=21cm]{ticket-inner.pdf}
\includegraphics[width=21cm]{ticket-inner.pdf}
EOF
done
cat >> tickets.tex <<EOF
\end{document}
EOF
pdflatex tickets.tex

The result is a 100-page PDF, which when printed double-sided, will yield a stack of tickets that are uniquely numbered and serve as programmes.

2023/03/24 by Redhatter (VK4MSL) Computing Open Source Public Syndication 0

A crude attempt at memory management

The other day I had a bit of a challenge to deal with. My workplace makes embedded data collection devices which are built around the Texas Instruments CC2538 SoC (internal photos visible here) and run OpenThread. To date, everything we’ve made has been an externally-powered device, running off either DC power (9-30V) or mains (120/240V 50/60Hz AC). CC2592 range extender support was added to OpenThread for this device.

The CC2538, although very light on RAM (32KiB), gets the job done with some constraints. Necessity threw us a curve-ball the other day, we wanted a device that ran off a battery. That meant going into sleep mode periodically, deep sleep! The CC2538 has a number of operating modes:

running mode (pretty much everything turned on)
light sleep mode (clocks, CPU and power stays on, but we pause a few peripherals)
deep sleep mode — this comes in four flavours
- PM0: Much like light-sleep, but we’ve got the option to pause clocks to more peripherals
- PM1: PM0, plus we halt the main system clock (32MHz crystal or 16MHz RC), halting the CPU
- PM2: PM1 plus we power down the bottom 16KiB of RAM and some other internal peripherals
- PM3: PM2 plus we turn off the 32kHz crystal used by the sleep timer and watchdog.

We wanted PM2, which meant while we could use the bottom 16KiB of RAM during run-time, the moment we went to sleep, we had to forget about whatever was kept in that bottom 16KiB RAM — since without power it would lose its state anyway.

The challenge

Managing RAM in a device like this is always a challenge. malloc() is generally frowned upon, however in some cases it’s a necessary evil. OpenThread internally uses mbedTLS and that, relies on having a heap. It can use one implemented by OpenThread, or one provided by you. Our code also uses malloc for some things, notably short-term tasks like downloading a new configuration file or for buffering serial traffic.

The big challenge is that OpenThread itself uses a little over 9KiB RAM. We have a 4KiB stack. We’ve got under 3KiB left. That’s bare-bones OpenThread. If you want JOINER support, for joining a mesh network, that pulls in DTLS, which by default, will tell OpenThread to static-allocate a 6KiB buffer.

9KiB becomes about 15KiB; plus the stack, that’s 19KiB. This is bigger than 16KiB — the linker gives up.

Using heap memory

There is a work-around that gets things linking; you can build OpenThread with the option OPENTHREAD_CONFIG_HEAP_EXTERNAL_ENABLE — if you set this to 1, OpenThread forgoes its own heap and just uses malloc / free instead, implemented by your toolchain.

OpenThread builds and links in 16KiB RAM, hooray… but then you try joining, and; NoBufs is the response. We’re out of RAM. Moving things to the heap just kicked the can down the road, we still need that 6KiB, but we only have under 3KiB to give it. Not enough.

We have a problem in that, the toolchain we use, is built on newlib, and while it implements malloc / free / realloc; it does so with a primitive called _sbrk(). We define a pointer initialised up the top of our .bss, and whenever malloc needs more memory for the heap, it calls _sbrk(N); we grab the value of our pointer, add N to it, and return the old value. Easy.

Except… we don’t just have one memory pool now, we have two. One of which, we cannot use all the time. OpenThread, via mbedTLS also winds up calling on malloc() very early in the initialisation (as early as the otInstanceInitSingle() call to initialise OpenThread). We need that block of RAM to wind up in the upper 16KiB that stays powered on — so we can’t start at address 0x2000:0000 and just skip over .data/.bss when we run out.

malloc() will also get mighty confused if we suddenly hand it an address that’s lower than the one we handed out previously. We can’t go backwards.

I looked at replacing malloc() with a dual-pool-aware version, but newlib is hard-coded in a few places to use its own malloc() and not a third-party one. picolibc might let us swap it out, but getting that integrated looked like a lot of work.

So we’re stuck with newlib‘s malloc() for better or worse.

The hybrid approach

One option, we can’t control what malloc the newlib functions use. So use newlib‘s malloc with _sbrk() to manage the upper heap. Wrap that malloc with our own creation that we pass to OpenThread: we implement otPlatCAlloc and otPlatFree — which are essentially, calloc and free wrappers.

The strategy is simple; first try the normal calloc, if that returns NULL, then use our own.

Re-purposing an existing allocator

The first rule of software engineering, don’t write code you don’t have to. So naturally I went looking for options.

Page upon page of “No man don’t do it!!!”

jemalloc looked promising at first, it is the FreeBSD malloc(), but that there, lies a problem — it’s a pretty complicated piece of code aimed at x86 computers with megabytes of RAM minimum. It used uint64_ts in a lot of places and seemed like it would have a pretty high overhead on a little CC2538.

I tried avr-libc‘s malloc — it’s far simpler, and actually is a free-list implementation like newlib‘s version, but there is a snag. See, AVR microcontrollers are 8-bit beasts, they don’t care about memory alignment. But the Cortex M3 does! avrlibc_malloc did its job, handed back a pointer, but then I wound up in a HARDFAULT condition because mbedTLS tried to access a 32-bit word that was offset by a few bytes.

A simple memory allocator

The approach I took was a crude one. I would allocate memory in fixed-sized “blocks”. I first ran the OpenThread code under a debugger and set a break-point on malloc to see what sizes it was asking for — mostly blocks around the 128 byte mark, sometimes bigger, sometimes smaller. 64-byte blocks would work pretty well, although for initial testing, I went the lazy route and used 8-byte blocks: uint64_ts.

In my .bss, I made an array of uint8_ts; size equal to the number of 8-byte blocks in the lower heap divided by 4. This would be my usage bitmap — each block was allocated two bits, which I accessed using bit-banding: one bit I called used, and that simply reported the block was being used. The second was called chained, and that indicated that the data stored in this block spilled over to the next block.

To malloc some memory, I’d simply look for a string of free blocks big enough. When it came to freeing memory, I simply started at the block referenced, and cleared bits until I got to a block whose chained bit was already cleared. Because I was using 8-byte blocks, everything was guaranteed to be aligned.

8-byte blocks in 16KiB (2048 blocks) wound up with 512 bytes of usage data. As I say, using 64-byte blocks would be better (only 256 blocks, which fits in 64 bytes), but this was a quick test. The other trick would be to use the very first few blocks to store that bitmap (for 64-byte blocks, we only need to reserve the first block).

The scheme is somewhat inspired by the buddy allocator scheme, but simpler.

Bit banding was simple enough; I defined my struct for accessing the bits:

struct lowheap_usage_t {
        uint32_t used;
        uint32_t chained;
};

and in my code, I used a C macro to do the arithmetic:

#define LOWHEAP_USAGE                                                   \
        ((struct lowheap_usage_t*)(((((uint32_t)&lowheap_usage_bytes)   \
                                     - 0x20000000)                      \
                                    * 32)                               \
                                   + 0x22000000))

The magic numbers here are:

0x20000000: the start of SRAM on the CC2538
0x22000000: the start of the SRAM bit-band region
32: the width of each word in the CC2538

Then, in my malloc, I could simply call…

struct lowheap_usage_t* usage = LOWHEAP_USAGE;

…and treat usage like an array; where element 0 was the usage data for the very first block down the bottom of SRAM.

To implement a memory allocator, I needed five routines:

one that scanned through, and told me where the first free block was after a given block number (returning the block number) — static uint16_t lowheap_first_free(uint16_t block)
one that, given the start of a run of free blocks, told me how many blocks following it were free — static uint16_t lowheap_chunk_free_length(uint16_t block, uint16_t required)
one that, given the start of a run of chained used blocks, told me how many blocks were chained together — static uint16_t lowheap_chunk_used_length(uint16_t block)
one that, given a block number and count, would claim that number of blocks starting at the given starting point — static void lowheap_chunk_claim(uint16_t block, uint16_t length)
one that, given a starting block, would clear the used bit for that block, and if chained was set; clear it and repeat the step on the following block (and keep going until all blocks were freed) — static void lowheap_chunk_release(uint16_t block)

From here, implementing calloc was simple:

first, try the newlib calloc and see if that succeeded. Return the pointer we’re given if it’s not NULL.
if we’re still looking for memory, round up the memory requirement to the block size.
initialise our starting block number (start_nr) by calling lowheap_first_free(0) to find the first block; then in a loop:
- find the size of the free block (chunk_len) by calling lowheap_chunk_free_length(start_nr, required_blocks).
- If the returned size is big enough, break out of the loop.
- If not big enough, increment start_nr by the return value from lowheap_chunk_used_length(start_nr + chunk_len) to advance it past the too-small free block and the following used chunk.
- Stop iterating of start_nr is equal to or greater than the total number of blocks in the heap.
If start_nr winds up being past the end of the heap, fail with errno = ENOMEM and return NULL.
Otherwise, we’re safe, call lowheap_chunk_claim(start_nr, required_blocks); to reserve our space, zero out the actual blocks allocated, then return the address of the first block cast to void*.

Implementing free was not a challenge either: either the pointer was above our heap, in which case we simply passed the pointer to newlib‘s free — or if it was in our heap space, we did some arithmetic to figure out which block that address was in, and passed that to lowheap_chunk_release().

I won’t publish the code because I didn’t get it working properly in the end, but I figured I’d put the notes here on how I put it together to re-visit in the future. Maybe the thoughts might inspire someone else. 🙂

2023/03/10 by Redhatter (VK4MSL) Computing Public Syndication Thinktank 0

Demise of classic hardware: the final act

So today I finally got around to the SGI kit in my possession. Not quite sure where all of it went, there’s a SGI PS/2 keyboard, Indy Presenter LCD and a SGI O2 R5000 180MHz CPU module that have gone AWOL, but this morning I took advantage of the Brisbane City Council kerb-side clean-up.

Screenshot of the post — yes I need to get Mastodon post embedding working

I rounded up some old Plextor 12× CD-ROM drives (SCSI interface) that took CD caddies (remember those?) as well to go onto the pile, and some SCSI HDDs I found laying around — since there’s a good chance the disks in the machines are duds. I did once boot the Indy off one of those CD-ROM drives, so I know they work with the SGI kit.

The machines themselves had gotten to the point where they no longer powered on. The O2 at first did, and I tried saving it, but I found:

it was unreliable, frequently freezing up — until one day it stopped powering on
the case had become ridiculously brittle

The Indy exploded last time I popped the cover, and fragments of the Indigo2 were falling off. The Octane is the only machine whose case seemed largely intact. I had gathered up what IRIX kit I had too, just in case the new owners wanted to experiment. archive.org actually has the images, and I had a crack at patching irixboot to be able to use them. Never got to test that though.

Today I made the final step of putting the machines out on the street to find a new home. It looks like exactly that has happened, someone grabbed the homebrew DB15 to 13W3 cable I used for interfacing to the Indy and Indigo2, then later in the day I found the lot had disappeared.

With more room, I went and lugged the old SGI monitor down, it’s still there, but suspect it’ll possibly go too. The Indy and Indigo2 looked to be pretty much maxxed-out on RAM, so should be handy for bits for restoring other similar-era SGI kit. I do wish the new owners well with their restoration journey, if that’s what they choose to do.

For me though, it’s the end of an era. Time to move on.

2023/01/22 by Redhatter (VK4MSL) Computing Open Source Public Syndication 0

Hashing the audio data in a file

So, I’ve got a big music collection…

RC=0 stuartl@rikishi ~ $ find /mnt/music-archive/.by-uuid/ -type f -name \*.flac | wc -l
7624

I keep a few copies of it. Between three of my machines and two USB drives (one HDD, one SSD), I keep a copy of the lossless archive. This is a recent addition since (1) I’ve got the space to do it, and (2) some experimentation with Ogg/Vorbis metadata corrupting files necessitated me re-ripping everything so I thought I’ll save future-me the hassle by keeping a lossless copy on-hand.

Actually, this is not the first time I’ve done a re-rip of the whole collection. The previous re-rip was done back in 2005 when I moved from MP3 to Ogg/Vorbis (and ditched a lot of illegally obtained MP3s while I was at it — leaving me with just the recordings that I had licenses for). But, back then, storing a lossless copy of every file as I re-ripped everything would have been prohibitively expensive in terms of required storage. When even my near-10-year-old laptop sports a 2TB SSD, this isn’t a problem.

The working copy that I generally do my listening from uses the Ogg/Vorbis format today. I haven’t quite re-ripped everything, there’s a stack of records that are waiting for me to put them back on the turntable … one day I’ll get to those … but every CD, DVD and digital download (which were FLAC to begin with) is losslessly stored in FLAC.

If I make a change to the files, I really want to synchronise my changes between the two copies. Notably, if I change the file data, I need to re-encode the FLAC file to Ogg/Vorbis — but if I simply change its metadata (i.e. cover art or tags), I merely need to re-write the metadata on the destination file and can save some processing cycles.

The thinking is, if I can “fingerprint” the various parts of the file, I can determine what bits changed and what to convert. Obviously when I transcode the audio data itself, the audio data bytes will bear little resemblance to the ones that were fed into the transcoder — that’s fine — I have other metadata which can link the two files. The aim of this exercise is to store the hashes for the audio data and tags, and detect when one of those things changes on the source side, so the change can be copied across to the destination.

Existing option: MD5 hash

FLAC actually does store a hash of its source input as part of the stream metadata. It uses the MD5 hashing algorithm, which while good enough for a rough check, and is certainly better than linear codes like CRC, it’s really quite dated as a cryptographic hash.

I’d prefer to use SHA-256 for this since it’s generally regarded as being a “secure” hash algorithm that is less vulnerable to collisions than MP3 or SHA-1.

Naïve approach: decode and compare

The naïve approach would be to just decode to raw audio data and compare the raw audio files. I could do this via a pipe to avoid writing the files out to disk just to delete them moments later. The following will output a raw file:

RC=0 stuartl@rikishi ~ $ time flac -d -f -o /tmp/test.raw /mnt/music-archive/by-album-artist/Traveling\ Wilburys/The\ Traveling\ Wilburys\ Collection/d1o001t001\ Traveling\ Wilburys\ -\ Handle\ With\ Care.flac 

flac 1.3.4
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

d1o001t001 Traveling Wilburys - Handle With Care.flac: done         

real    0m0.457s
user    0m0.300s
sys     0m0.065s

On my laptop, it takes about 200~500ms to decode a single file to raw audio. Multiply that by 7624 and you get something that will take nearly an hour to complete. I think we can do better!

Alternate naïve approach: Copy the file then strip metadata

Making a copy of the file without the metadata is certainly an option. Something like this will do that:

RC=0 stuartl@rikishi ~ $ time ffmpeg -y -i \
    /mnt/music-archive/by-album-artist/Traveling\ Wilburys/The\ Traveling\ Wilburys\ Collection/d1o001t001\ Traveling\ Wilburys\ -\ Handle\ With\ Care.flac \
    -c:a copy -c:v copy -map_metadata -1 \
    /tmp/test.flac
… snip lots of output …
Output #0, flac, to '/tmp/test.flac':
  Metadata:
    encoder         : Lavf58.76.100
  Stream #0:0: Video: mjpeg (Progressive), yuvj420p(pc, bt470bg/unknown/unknown), 1000x1000 [SAR 72:72 DAR 1:1], q=2-31, 90k tbr, 90k tbn, 90k tbc (attached pic)
  Stream #0:1: Audio: flac, 44100 Hz, stereo, s16
    Side data:
      replaygain: track gain - -8.320000, track peak - 0.000023, album gain - -8.320000, album peak - 0.000023, 
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
  Stream #0:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=    1 fps=0.0 q=-1.0 Lsize=   24671kB time=00:03:19.50 bitrate=1013.0kbits/s speed=4.77e+03x    
video:114kB audio:24549kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.033000%

real    0m0.139s
user    0m0.105s
sys     0m0.032s

This is a big improvement, but just because the audio blocks are the same does not mean the file itself won’t change in other ways — FLAC files can include “padding blocks” anywhere after the STREAMINFO block which will change the hash value without having any meaningful effect on the file content.

So this may not be as stable as I’d like. However, ffmpeg is on the right track…

Audio fingerprinting

MusicBrainz actually has an audio fingerprinting library that can be matched to a specific recording, and is reasonably “stable” across different audio compression formats. Great for the intended purpose, but in this case it’s likely going to be computationally expensive since it has to analyse the audio in terms of frequency components, try to extract tempo information, etc. I don’t need this level of detail.

It may also miss that one file might for example, be proceeded by lots of silence — multi-track exports out of Audacity are a prime example. Audacity used to just export the multiple tracks “as-is” so you could re-construct the full recording by concatenating the files, but some bright-spark thought it would be a good idea to prepend the exported tracks with silence by default so if re-imported, their relative positions were “preserved”. Consequently, I’ve got some record rips that I need to fix because of the extra “silence”!

Getting hashes out of `ffmpeg`

It turns out that ffmpeg can output any hash you’d like of whatever input data you give it:

RC=0 stuartl@rikishi ~ $ time ffmpeg \
    -loglevel quiet \
    -i /tmp/test.flac \
    -c:a copy -vn -map_metadata -1 -f hash -hash sha256 -
SHA256=31e38749daa1061e6a2008ea61e841e5bc05b8b9ec1f0dfc54d8cd70f18fee3f

real    0m0.248s
user    0m0.234s
sys     0m0.014s
RC=0 stuartl@rikishi ~ $ time ffmpeg \
    -loglevel quiet \
    -i /mnt/music-archive/by-album-artist/Traveling\ Wilburys/The\ Traveling\ Wilburys\ Collection/d1o001t001\ Traveling\ Wilburys\ -\ Handle\ With\ Care.flac \
    -c:a copy -vn -map_metadata -1 -f hash -hash sha256 -
SHA256=31e38749daa1061e6a2008ea61e841e5bc05b8b9ec1f0dfc54d8cd70f18fee3f

real    0m0.242s
user    0m0.226s
sys     0m0.016s

Notice the hashes are the same, yet the first copy of the file we hashed does not contain the tags or cover art present in the file it was generated from. Speed isn’t as good as just stripping the metadata, but on the flip-side, it’s not as expensive as decoding the file to raw format, and should be more stable than naïvely hashing the whole file after metadata stripping.

Where to from here?

Well, having a hash, I can store this elsewhere (I’m thinking SQLite3 or LMDB), then compare it later to know if the audio has changed. It’s not difficult or expensive using mutagen or similar to extract the tags and images, those can be hashed using conventional means to generate a hash of that information. I can also store the mtime and a complete file hash for an even faster “quick check”.

2022/12/10 by Redhatter (VK4MSL) Computing Music Public Syndication Thinktank 0