lvm

lvmcache fun and games

This blog has never been on what I’d call, a high-performance server. In fact, things are a little on the slow side. I try to be frugal with my system resource allocation, with the assumption that my little site does not get a lot of traffic (much less since it’s no longer syndicated on Gentoo Planet). However, I think I managed to get the performance up a notch…

The site runs on my solar powered server cluster, with a couple of Ceph RBDs, one for the root OS and one for the data (MariaDB / www root / /home). The VM runs AlpineLinux. The VM host was over-provisioned with a larger SSD than required, allowing me to dedicate some space for local cache.

I had thought I could set something up that would organise the cache on the VM host, and abstract it from the VM, but so far, I’ve not gotten around to doing that. (I did have something sort-of working in OpenNebula with flashcache at work, but it was flaky.)

In libvirt, I provisioned a new RBD to serve as the backing store (thus keeping a pristine copy to roll back to should things go pear shaped), and a new LVM volume for the cache. For the time being, I moved the existing volume to be the last device. So I had:

  • /dev/vda: OS
  • /dev/vdb: Data volume
  • /dev/vdc: Cache volume
  • /dev/vdd: temporary Old /dev/vdb for data migration

Failed approaches

Firstly, what didn’t work for me, was bcachefs and bcache.

bcachefs

bcachefs wanted to fight me every step of the way, making formatting the volumes difficult with sketchy documentation (especially as I wanted a write-through cache to facilitate VM migration).

bcachefs format gives some very cryptic error messages, and has a somewhat quirky argument syntax for formatting. The command I figured out through trial-and-error was this:

bcachefs format \
    --replicas=1 \
    --durability=0 /dev/vdc1 \
    --durability=1 /dev/vdb1 \
    --foreground_target /dev/vdc1 \
    --promote_target /dev/vdc1 \
    --background_target /dev/vdb1 \
    --metadata_target /dev/vdc1

The problem was convincing mount to actually mount it. I was supposed to specify every device, but each time it flatly refused, no matter what order I used, it told me “no such device”.

bcache

This is the underlying caching logic that bcachefs was built on, so I figured I’d try that. This worked better, but I found AlpineLinux had no real knowledge of bcache, and thus did not provide any means for me to bring up /dev/bcache0 before localmount mounted it.

I could have written a OpenRC init script to do this, but I wasn’t certain about this path, so decided to put the idea aside.

Winning approach: lvmcache

Luckily lvm2 has a built-in method: lvmcache. After installing the lvm2 package in AlpineLinux, I blatted the partition tables on my two virtual disks, formatted them as LVM physical volumes, and added them to a volume group.

~ # pvcreate /dev/vdb /dev/vdc
  Physical volume "/dev/vdb" successfully created.
  Physical volume "/dev/vdc" successfully created.
~ # vgcreate data /dev/vdb
  Volume group "data" successfully created
~ # vgextend data /dev/vdc 
  Volume group "data" successfully extended

Now to create the logical volumes, first… I created the volumes themselves. This wound up being a little tricky because I wanted to use all the available space on each volume… I had tried specifying -L ${SZ}G but this ignored the fact that LVM uses a bit of header space on each physical volume. It complained, but in doing so, told me the size in extents that was available, so I was able to use -l ${SZ} to specify that number of extents:

~ # lvcreate --size 8G --name datavol data /dev/vdb
 Insufficient free space: 2048 extents needed, but only 2047 available
~ # lvcreate -l 2047 --name datavol data /dev/vdb  
 Logical volume "datavol" created.
~ # lvcreate -n cachevol -l 4095 data /dev/vdc
 Volume group "data" has insufficient free space (1023 extents): 4095 required.
~ # lvcreate -n cachevol -l 1023 data /dev/vdc
 Logical volume "cachevol" created.

Now I had two separate LVM volumes, one on each physical device. Now to link them:

~ # lvconvert --type cache --cachevol cachevol data/datavol
Erase all existing data on data/cachevol? [y/n]: y
 Logical volume data/datavol is now cached.

Great, except I forgot to specify the write mode. Turns out, this is a lvchange away:

~ # lvchange --cachemode writethrough /dev/data/datavol  
 Logical volume data/datavol changed.

I could now format /dev/data/datavol with a filesystem, and migrate the data across. rsync here we come. An update to /etc/fstab and we were in business.

So far, things seem to be more snappy, so we’ll keep an eye on things. It’s survived a couple of reboots, the question is what happens when I boost a post on Mastodon, does all the ActivityPub instances out there cause problems? Guess I’ll find out in a moment.