CSV to ledger, revisited: matching, real dates, and a second bank
Back in 2022 I wrote about my ING DiBa csv to ledger converter. The closing line was "no additional magic" and I noted that I ignored the difference between booking date and effective date.
Three things changed since then:
the script learned to categorize transactions instead of emitting
Expenses:FIXMEfor everything,it learned to recover the real transaction date out of the VISA booking text, and
I finally wrote a second converter, for my Revolut account, which had been sitting un-automated for a few years.
Auto-categorizing with a match table
The 2022 version produced one account for every row: Expenses:FIXME.
I still copy entries into a ledger file per month by hand, but now most rows arrive pre-booked.
The core is a small list of matches -- a substring to look for in the transaction text, plus the account, payee and optional tags to emit:
@dataclass class Match: match: str description: str account: str amount: Decimal | None = None tags: str | None = None def process_match(string, amount=None): for item in [ Match("Hetzner Online GmbH", "Hetzner", "Expenses:Infrastructure:Hetzner"), Match("Rundfunk ARD, ZDF, DRadio", "GEZ", "Expenses:Media:GEZ"), Match("LIDL SAGT DANKE", "Einkaufen", "Expenses:Supermarket:Lidl"), Match("LOGPAY FIN", "VVS Ticket", "Expenses:PublicTransport:VVS"), # ... lots more ... Match("", "FIXME", "Expenses:FIXME"), # default ]: if item.match in string: if item.amount is not None and amount is not None: if abs(amount) != abs(item.amount): continue return { "description": item.description, "account": item.account, "tags": item.tags, }
Expenses:FIXME is now the default at the bottom of the list instead of the only output. The first hit wins, so the list goes from specific to generic.
The amount field handles a wrinkle I did not anticipate: the same merchant string can mean different things depending on the amount. My Ionity charging is billed under one name, but €5.99 and €11.99 are two different subscription tiers, and anything else is an actual charging session:
Match(" IONITY", "Ionity Power Monthly", "Expenses:Car:Ionity:SubMotion", amount=Decimal("5.99"), tags="subscription-monthly:"), Match(" IONITY", "Ionity Power Monthly", "Expenses:Car:Ionity:SubPower", amount=Decimal("11.99"), tags="subscription-monthly:"), Match(" IONITY", "Ionity Charge", "Expenses:Car:Ionity:Charge39"),
The price for the charge can be €0.39, €0.49 or €0.65. I currently haven't automated this based on the previous monthly subscription. So I change the name of the account manually.
The real transaction date
In 2022 I wrote "I chose to only use the effective date" but a few weeks ago this annoyed me too much. Some subscription cycle calculations were off by a few days, because a card payment is booked a day or two after I actually swiped the card.
ING hides the real date in the VISA text as KAUFUMSATZ DD.MM (without the year).
So I pull it back out:
KAUFUMSATZ_RE = re.compile(r"KAUFUMSATZ (\d{2})\.(\d{2})") def kaufumsatz_date(comment: str, booking: date) -> date | None: m = KAUFUMSATZ_RE.search(comment) if not m: return None day, month = int(m.group(1)), int(m.group(2)) try: d = date(booking.year, month, day) except ValueError: return None # KAUFUMSATZ always precedes booking; roll back a year across Jan/Dec wrap. return d.replace(year=booking.year - 1) if d > booking else d
The year is filled from the booking date and roll back one year if that would put the transaction after its own booking (the December/January wrap).
I only apply this where it matters -- the car charging and subscription entries, whose cycle analysis is day-sensitive.
Everything else keeps the booking date, emitted as a date: tag only when it actually differs, so the journal stays uncluttered.
The thing that makes that analysis possible is not a tag but the account names themselves.
Look back at the Ionity matches: Expenses:Car:Ionity:SubPower marks a monthly subscription tier, and Expenses:Car:Ionity:Charge39 is a charging session where the 39 is the price -- €0.39/kWh, encoded in the account name.
My analysis script just asks hledger for Expenses:Car:.*:(Sub|Charge).* and reads the rate straight out of the account string.
That structured naming is exactly what feeds my post on Ionity subscription calculations.
Now I have exact dates which I didn't have when writing the Ionity post.
Adding a second bank: Revolut
I have had a Revolut account for years. For the first years of the account I manually converted the CSV export to transactions. I didn't use the card that much so this was not an issue. But since then I started to use virtual credit cards more often and this manual process kept me from updating my ledger files for quite a while now.
The CSV is a completely different shape from ING's: ISO dates, international number format, and columns for Type, Product and Fee.
The conversion itself is the same idea as before; the interesting part turned out to be deciding what not to book.
Interest accrues on the savings pot in hundreds of tiny rows. I ignore all of them (for now).
The file contains the savings account's own view of every transfer as
Product=Depositrows. Those mirror theCurrentside I already book, so booking both would double-count. I keep onlyCurrent.A couple of net-zero "balance migration to another region or legal entity" transfers are pure noise and get dropped.
Card payments map through the same match table. Revolut reports its small FX fee in a separate column, so it becomes its own posting whenever it is non-zero:
# Card Payment | GitHub 2025-06-28 Github Expenses:edu:simonwillison €8.57 ; subscription-monthly: Expenses:misc:ExchangeFee €0.09 Assets:Revolut:Euro
Top-ups bridge to my ING account (Assets:Girokonto), and savings moves stay internal between Assets:Revolut:Euro and Assets:Revolut:InstantAccessSavings.
I am still not tracking the accounts of my investment depots. Only the money I transfer there. This is a task for another day.
