Claude Skill to get Stuttgart Waste Dates

Last year I experimented with MCPs by building one for waste removal dates in Stuttgart. I used this MCP in different settings since then to demonstrate what MCPs can do and I tried to use it to experiment with different services. But overall I am actually using a Python script that pushes the next waste removal dates to my Homeassistant.

Still I wanted to see how would this MCP look like as a Claude Skill.

This skill is saved in ~/.claude/skills/stuttgart-waste/SKILL.md. And next to it the csv with all the streets of Stuttgart from the Github Repository of the str-ical2json MCP.

The SKILL.md I ended up with, looks like this:

---
allowed-tools:
  - Read(~/.claude/skills/stuttgart-waste/stuttgart_streets.csv)
  - Grep(path:~/.claude/skills/stuttgart-waste/stuttgart_streets.csv)
  - WebFetch(domain:service.stuttgart.de)

---

# Stuttgart Waste Collection Dates

1. Extract **street name** and **house number** from the user's query. Ask if missing.

2. Validate against `stuttgart_streets.csv` (in this skill's directory). Columns: `street_name`, `house_count`, `numbers`, `type`.
   - Normalize "straße"/"strasse" to "str." for matching.
   - Fuzzy-match if no exact hit. Verify house number exists in `numbers`.

3. Fetch iCal via WebFetch:
   - URL: `https://service.stuttgart.de/lhs-services/aws/api/ical?street={street}&streetnr={streetnr}`
   - Use the street name as it appears in the CSV, URL-encoded.
   - Prompt: "Extract every VEVENT. For each, list DTSTART date (YYYY-MM-DD) and SUMMARY. Format: DATE | SUMMARY. List ALL events."

4. Show only **future dates**, grouped by waste type. **Default: next 1-2 dates per type.** Full list only if explicitly requested.

The permissions are allowing to read and grep the one csv file. And the skill is only allowed to call the service.stuttgart.de domain to retrieve the ical data.

Finally one example call within an empty Claude Code shell:

❯ /stuttgart-waste for Katzenbachstrasse 58

● Searched for 1 pattern, read 1 file (ctrl+o to expand)

● Fetch(https://service.stuttgart.de/lhs-services/aws/api/ical?street=Katzenbachstr.&streetnr=58)
  ⎿  Received 33.8KB (200 OK)

● Katzenbachstr. 58 — next pickups:

  ┌─────────────┬───────────────┬───────────────┐
  │    Type     │     Next      │  After that   │
  ├─────────────┼───────────────┼───────────────┤
  │ Restmüll    │ Thu, 5 March  │ Thu, 19 March │
  ├─────────────┼───────────────┼───────────────┤
  │ Biomüll     │ Thu, 5 March  │ Thu, 12 March │
  ├─────────────┼───────────────┼───────────────┤
  │ Altpapier   │ Wed, 11 March │ Tue, 31 March │
  ├─────────────┼───────────────┼───────────────┤
  │ Gelber Sack │ Fri, 20 March │ Mon, 13 April │
  └─────────────┴───────────────┴───────────────┘

The clear downside of this setup is, that you need to use Claude Code to get the answer. On the positive side: It is impressive with how little instructions this just works.

I started to build more skills. For example one that fetches screenshots from my pastebin and reads it from /tmp. That skill is mainly to allow the exact permissions to get the image without asking me.

Wikidata cache

Roughly a year ago I used cachelib to cache Wikidata requests. But now I have way too many requests to still hit live Wikidata. So I decided to use the wikidata dump.

The issue with the dump is, that it is one big bzipped jsonl file. Keeping the file compressed and jumping to a specific entry is hard. Processing the file with something like indexed-bzip2 could be possible, but for me it doesn't feels worth it.

So I decided on a different solution: Converting the jsonl.bz2 file to SQLite. The data structure I decided on is

CREATE TABLE entities (
     entity_id TEXT,
     label_en TEXT,
     label_de TEXT,
     data BLOB NOT NULL,
     modified TEXT
 );

In the data field is the bz2 stored version of the json of a Wikidata entry. The modified is copied from the data the same as the entity_id and the labels. The two label columns are in there mainly for debugging reasons.

The main bottleneck for the processing is the bzip2 extraction and compression. So the first speed improvement is to install lbzip2 for decompression of the wikidata export. The other improvement is to split the processing up to as many threads as cores available:

pv "$1" | lbzip2 -dc | parallel -j 8 --pipe --block 200M -N 100 uv run process.py

The interesting parts of the process.py code looks like this:

# get the jobslot from parallel
worker_id = os.environ.get("PARALLEL_JOBSLOT", "1")
db_path = f"wikidata-cache-worker-{worker_id}.db"

for line in sys.stdin:
    line = line.strip()
    if line.endswith(","):
        line = line[:-1]
    if line.startswith(("[", "]")) or not line:
        continue

    data = json.loads(line)
    data_compressed = bz2.compress(json.dumps(data, separators=(",", ":")).encode("utf-8"))

# save batch to db

I use sqlite-utils to insert a list of entities with .insert_all(). After the full processing of the Wikidata dump is finished, another Python script is merging the databases. The fastest way here was to drop the indexes first and then insert like this:

conn = sqlite3.connect(main_db_path)
for worker_path in worker_dbs:
    conn.execute(f"ATTACH DATABASE '{worker_path}' AS worker")
    conn.execute(
        """
        INSERT INTO entities
        SELECT * FROM worker.entities
    """
    )
    conn.commit()
    conn.execute("DETACH DATABASE worker")

The whole convertion takes roughly 1.5 days on an 10 year old i7 with 8 cores. There is obviously the tradeoff between using all cores for compression/decompression vs. the timesink of merging the dbs. So I benchmarked this on the same machine by only using one SQLite db. I stopped the single job experiment after 2 days with 40% finished.

Now I have a 400GB SQLite database generated out of a 100GB wikidata-*-all.json.bz2.

To retrieve the data I added a small FastAPI app:

import asyncio
import bz2
import json

from fastapi import FastAPI, HTTPException
from sqlite_utils import Database

app = FastAPI()
db = Database("wikidata-cache.db", check_same_thread=False)

def fetch_entity(entity_id: str) -> dict | None:
    rows = db["entities"].rows_where("entity_id = ?", [entity_id], limit=1)
    row = next(rows, None)
    if row is None:
        return None
    return json.loads(bz2.decompress(row["data"]))

@app.get("/{entity_id}.json")
async def get_entity(entity_id: str):
    entity = await asyncio.to_thread(fetch_entity, entity_id)
    if not entity:
        raise HTTPException(status_code=404, detail="Entity not found")
    return {"entities": {entity_id: entity}}

The format is on purpose the same as for the Wikidata Json Special page, i.e. for Q42. Now I can process a lot of Wikidata entries without hitting the Wikidata servers.

Linux Thinkpad Learnings

A few weeks ago the USB-C power supply of my work notebook died. As a replacement I ordered a UGreen one which can power multiple USB-C devices -- resulting in less power plugs on my desk at home.

After this I looked into not killing the battery of this Thinkpad by configuring the charging. Addionally the tool I installed could manage the temperature by not powering the notebook to the limit.

The Thinkpad I got from my employer is a Lenovo T14 Gen3 Ryzen7. Because I work at home most of the time, the notebook is pluged in nearly all the time. It is not healthy for the battery to always charge a tiny bit until it is full again.

To fix this I install TLP. The default Battery charging thresholds are perfect:

# Battery charge level below which charging will begin.
START_CHARGE_THRESH_BAT0=75
# Battery charge level above which charging will stop.
STOP_CHARGE_THRESH_BAT0=80

These values are the defaults, but I still removed the comments for me to be aware that I want them like that.

Additionally I unplugged the notebook a few times and recharged it, to give the battery a bit of "normal" charging. I am aware that mistreated batteries need to be observed if they inflate. Mine seems to be okay.

So as I already said, the Thinkpad is most of the time on AC. And when using a lot of CPU the notebook gets warm and loud. I was not aware that there are multiple power profiles to manage this. Then plugged into AC the performance profile is active, which results in more heat and fan spinning when all cores are busy. But I actually don't need the performance profile of the notebook running on maximum when connected to AC. So I changed the platform profile for AC from performance to balanced with this line in the config:

PLATFORM_PROFILE_ON_AC=balanced

I still could change it to performace again when needed. Or even to low-power when I don't want to have the fan spinning.

This resolved the two ThinkPad issues I wasn't aware I needed to fix. 🎉