CSV to ledger, revisited: matching, real dates, and a second bank

Back in 2022 I wrote about my ING DiBa csv to ledger converter. The closing line was "no additional magic" and I noted that I ignored the difference between booking date and effective date.

Three things changed since then:

  1. the script learned to categorize transactions instead of emitting Expenses:FIXME for everything,

  2. it learned to recover the real transaction date out of the VISA booking text, and

  3. I finally wrote a second converter, for my Revolut account, which had been sitting un-automated for a few years.

Auto-categorizing with a match table

The 2022 version produced one account for every row: Expenses:FIXME. I still copy entries into a ledger file per month by hand, but now most rows arrive pre-booked.

The core is a small list of matches -- a substring to look for in the transaction text, plus the account, payee and optional tags to emit:

@dataclass
class Match:
    match: str
    description: str
    account: str
    amount: Decimal | None = None
    tags: str | None = None

def process_match(string, amount=None):
    for item in [
        Match("Hetzner Online GmbH", "Hetzner", "Expenses:Infrastructure:Hetzner"),
        Match("Rundfunk ARD, ZDF, DRadio", "GEZ", "Expenses:Media:GEZ"),
        Match("LIDL SAGT DANKE", "Einkaufen", "Expenses:Supermarket:Lidl"),
        Match("LOGPAY FIN", "VVS Ticket", "Expenses:PublicTransport:VVS"),
        # ... lots more ...
        Match("", "FIXME", "Expenses:FIXME"),  # default
    ]:
        if item.match in string:
            if item.amount is not None and amount is not None:
                if abs(amount) != abs(item.amount):
                    continue
            return {
                "description": item.description,
                "account": item.account,
                "tags": item.tags,
            }

Expenses:FIXME is now the default at the bottom of the list instead of the only output. The first hit wins, so the list goes from specific to generic.

The amount field handles a wrinkle I did not anticipate: the same merchant string can mean different things depending on the amount. My Ionity charging is billed under one name, but €5.99 and €11.99 are two different subscription tiers, and anything else is an actual charging session:

Match(" IONITY", "Ionity Power Monthly", "Expenses:Car:Ionity:SubMotion",
      amount=Decimal("5.99"), tags="subscription-monthly:"),
Match(" IONITY", "Ionity Power Monthly", "Expenses:Car:Ionity:SubPower",
      amount=Decimal("11.99"), tags="subscription-monthly:"),
Match(" IONITY", "Ionity Charge", "Expenses:Car:Ionity:Charge39"),

The price for the charge can be €0.39, €0.49 or €0.65. I currently haven't automated this based on the previous monthly subscription. So I change the name of the account manually.

The real transaction date

In 2022 I wrote "I chose to only use the effective date" but a few weeks ago this annoyed me too much. Some subscription cycle calculations were off by a few days, because a card payment is booked a day or two after I actually swiped the card.

ING hides the real date in the VISA text as KAUFUMSATZ DD.MM (without the year). So I pull it back out:

KAUFUMSATZ_RE = re.compile(r"KAUFUMSATZ (\d{2})\.(\d{2})")

def kaufumsatz_date(comment: str, booking: date) -> date | None:
    m = KAUFUMSATZ_RE.search(comment)
    if not m:
        return None
    day, month = int(m.group(1)), int(m.group(2))
    try:
        d = date(booking.year, month, day)
    except ValueError:
        return None
    # KAUFUMSATZ always precedes booking; roll back a year across Jan/Dec wrap.
    return d.replace(year=booking.year - 1) if d > booking else d

The year is filled from the booking date and roll back one year if that would put the transaction after its own booking (the December/January wrap).

I only apply this where it matters -- the car charging and subscription entries, whose cycle analysis is day-sensitive. Everything else keeps the booking date, emitted as a date: tag only when it actually differs, so the journal stays uncluttered.

The thing that makes that analysis possible is not a tag but the account names themselves. Look back at the Ionity matches: Expenses:Car:Ionity:SubPower marks a monthly subscription tier, and Expenses:Car:Ionity:Charge39 is a charging session where the 39 is the price -- €0.39/kWh, encoded in the account name. My analysis script just asks hledger for Expenses:Car:.*:(Sub|Charge).* and reads the rate straight out of the account string. That structured naming is exactly what feeds my post on Ionity subscription calculations. Now I have exact dates which I didn't have when writing the Ionity post.

Adding a second bank: Revolut

I have had a Revolut account for years. For the first years of the account I manually converted the CSV export to transactions. I didn't use the card that much so this was not an issue. But since then I started to use virtual credit cards more often and this manual process kept me from updating my ledger files for quite a while now.

The CSV is a completely different shape from ING's: ISO dates, international number format, and columns for Type, Product and Fee. The conversion itself is the same idea as before; the interesting part turned out to be deciding what not to book.

  • Interest accrues on the savings pot in hundreds of tiny rows. I ignore all of them (for now).

  • The file contains the savings account's own view of every transfer as Product = Deposit rows. Those mirror the Current side I already book, so booking both would double-count. I keep only Current.

  • A couple of net-zero "balance migration to another region or legal entity" transfers are pure noise and get dropped.

Card payments map through the same match table. Revolut reports its small FX fee in a separate column, so it becomes its own posting whenever it is non-zero:

# Card Payment | GitHub
2025-06-28 Github
    Expenses:edu:simonwillison                 €8.57  ; subscription-monthly:
    Expenses:misc:ExchangeFee                  €0.09
    Assets:Revolut:Euro

Top-ups bridge to my ING account (Assets:Girokonto), and savings moves stay internal between Assets:Revolut:Euro and Assets:Revolut:InstantAccessSavings.

I am still not tracking the accounts of my investment depots. Only the money I transfer there. This is a task for another day.

Tracking a hackerspace's open status in Home Assistant via SpaceAPI

A lot of hackerspaces publish their door state through SpaceAPI: a small JSON document with -- among other things -- a state.open boolean. My local space Essembly does too, at essembly.de/spaceapi.json:

{
  "api_compatibility": ["14", "15"],
  "space": "Essembly",
  "state": { "open": false }
}

I wanted that open/closed state in Home Assistant, using only built-in integrations.

There is an official SpaceAPI integration, but it goes the wrong way: it publishes your Home Assistant instance as a SpaceAPI endpoint. Cool for running a space, but no help for reading someone else's.

The core RESTful integration is all that is needed. My configuration.yaml already splits sensors into a folder:

sensor: !include_dir_merge_list includes/sensors

So the whole thing is one file, includes/sensors/essembly.yaml, as a list item (every file in a merge_list directory must be a list):

- platform: rest
  name: Essembly Space
  resource: https://essembly.de/spaceapi.json
  value_template: "{{ 'open' if value_json.state.open else 'closed' }}"
  scan_interval: 300

The value_template turns the boolean into a plain open/closed string; scan_interval: 300 polls every five minutes instead of the default 30 seconds.

A full reload was needed for the yaml file to be loaded.

essembly-state

Slack screenshare chooser on sway

Recently, screensharing in Slack huddles got annoying. The usual Slack share dialog still appeared, but additionally something popped up in the sway bar, stole the keyboard focus and I had to press Escape (twice) before I could continue. Sharing the full screen still worked, but the popup was blocking everything until dismissed.

Two different choosers are involved here. The share dialog with the "Entire screen" / "Window" tabs is Slack's own (Electron) picker and was always there. On Wayland, Electron cannot capture the screen itself: it requests a PipeWire stream via xdg-desktop-portal, and the portal backend decides which output backs that stream -- even the live preview inside Slack's dialog comes from such a stream.

The new thing in the bar turned out to be bemenu, started by xdg-desktop-portal-wlr as a source chooser. When an application requests a screencast, the portal walks through a list of chooser commands (wmenu, wofi, rofi, bemenu, ...) and runs the first one that exists. On my system only bemenu is installed, so that one won. The journal shows the chain:

/bin/sh: line 1: rofi: command not found
/bin/sh: line 1: wmenu: command not found
[ERROR] - wlroots: no output found

The no output found error is the result of pressing Escape: an empty chooser selection makes that capture request fail. Slack coped with it -- sharing the full screen still worked afterwards -- but having to dismiss a focus-stealing prompt twice per share is not a good workflow.

The trigger was probably the sway 1.11 to 1.12 upgrade -- before that, no chooser ever appeared. Slack itself (started with --enable-features=WebRTCPipeWireCapturer --ozone-platform=wayland) was unchanged.

Step 1: pin the chooser

xdg-desktop-portal-wlr reads ~/.config/xdg-desktop-portal-wlr/config. Pinning the chooser to slurp replaces the bar takeover with a crosshair where I click on the monitor I want to share:

[screencast]
chooser_type = simple
chooser_cmd = slurp -f %o -or

Then restart the portal: systemctl --user restart xdg-desktop-portal-wlr.

This worked, but now I had to click multiple times per share: Slack/Electron makes multiple portal requests (two for the live preview in the share dialog, one more when actually clicking "Share"), and the portal runs the chooser for every request. xdg-desktop-portal-wlr does not support the portal restore-token mechanism, so it cannot remember the previous answer.

Step 2: cache the answer

So I wrapped slurp in a small script that asks once and reuses the answer for repeated requests within 60 seconds -- ~/bin/xdpw-output-chooser:

#!/bin/sh
# Output chooser for xdg-desktop-portal-wlr: ask once via slurp,
# reuse the choice for repeated requests within 60 seconds
# (Slack/Electron fires several capture requests per share).

cache="${XDG_RUNTIME_DIR:-/tmp}/xdpw-output-choice"

if [ -f "$cache" ]; then
    age=$(( $(date +%s) - $(stat -c %Y "$cache") ))
    if [ "$age" -lt 60 ]; then
        cat "$cache"
        exit 0
    fi
fi

choice=$(slurp -f %o -or) || exit 1
printf '%s\n' "$choice" | tee "$cache"

And point the portal config to it:

[screencast]
chooser_type = simple
chooser_cmd = /home/mfa/bin/xdpw-output-chooser

Now starting a screenshare is: open the share dialog, click the monitor once in slurp, the preview appears, click "Share" -- done. One click instead of three, and nothing steals the keyboard focus anymore.