April 10, 2015

Day-by-day analysis of odometer readings

This afternoon, whilst waiting for a build job to complete I thought I’d do some further analysis on my annual mileage.

Now I don’t record my odometer readings daily (perhaps I should), but I do capture them every Sunday morning.  So I can possibly assume that the distance done for each day of a “run” is the total distance divided by the number of days.  I’m using a SQLite3 database to track this, question is, how do I extract this information?

This turned out to be the key to the answer.  I needed to enumerate all the days between two points.  SQLite3 has a julianday function, and with that I have been able to extract the information I need.

My database schema is simple. There are two tables:
CREATE TABLE bikes (id integer primary key not null, description varchar(64));
CREATE TABLE odometer (timestamp datetime not null default current_timestamp, action char(8) not null, bike_id integer not null, odometer real not null, constraint duplicate_log unique (timestamp, action, bike_id) on conflict replace);

Then there are the views.
CREATE VIEW run_id as select s.rowid as start_id, (select rowid from odometer where bike_id=s.bike_id and timestamp > s.timestamp and action='stop' order by timestamp asc limit 1) as stop_id from odometer as s where s.action='start';
CREATE VIEW "run" AS select start.timestamp as start_timestamp, stop.timestamp as stop_timestamp, start.bike_id as bike_id, start.odometer as start_odometer, stop.odometer as stop_odometer, stop.odometer-start.odometer as distance,julianday(start.timestamp) as start_day, julianday(stop.timestamp) as stop_day from (run_id join odometer as start on run_id.start_id=start.rowid) join odometer as stop on run_id.stop_id=stop.rowid;

The first view breaks up the start and stop events, and gives me row IDs for where each “run” starts and stops. I then use that in my run view to calculate distances and timestamps.

Here’s where the real voodoo lies, to enumerate days, I start at the very first timestamp in my dataset, find the Julian Day for that, then keep adding one day on until I get to the last timestamp. That gives me a list of Julian days that I can marry up to the data in the run view.

CREATE VIEW distance_by_day as
SELECT day_of_year, avg_distance FROM (
SELECT days.day - julianday(date(days.day,'start of year')) as day_of_year, sum(run.distance/max((run.stop_day-run.start_day),1))/count(*) as avg_distance
FROM run,
(WITH RECURSIVE
days(day) as (
SELECT julianday((select min(timestamp) from odometer))
union all
SELECT day+1 from days
limit cast(round(julianday((select max(timestamp) from odometer))-julianday((select min(timestamp) from odometer))) as int)
) SELECT day from days) as days
where
run.start_day < = days.day AND run.stop_day >= days.day
group by day_of_year) dist_by_doy;

This is the result.

Distance by Day Of Year

Distance by Day Of Year