Can agents win "The Game"? -- five strategies compared

The Game by Steffen Benndorf is a cooperative card game: 98 numbered cards (2--99), four piles (two going up from 1, two going down from 100), and the team wins when every card lands on a pile. The twist is that you may never tell teammates the exact number on a card you hold. You can hint -- "please don't play on the third pile" -- but no numbers.

I wanted to know whether agents constrained to the same information a human player has (their own hand, every pile's history) can actually finish all 98 cards. So I built a small Python engine for the game and five agents -- random, greedy, counting, expert and an MCTS-style flat Monte-Carlo agent -- and ran a few thousand games.

The setup

Each agent is a single function plan(snap) -> plays. snap is the info-set view (my hand, public pile tops + histories, public hand sizes, draw count) -- the same information a human gets across the table. Each agent picks an entire turn-sequence of plays at once: the planner enumerates every legal 2-card sequence the hand can make this turn, scores each summed sequence, and picks the lowest. This single trick of sequence-aware planning is worth a lot: it lets the agent set up a backwards trick (card == top ∓ 10) within its own turn -- e.g. play 60 on an UP pile, then play 50 as a -10 trick -- instead of waiting for the trick to happen by chance.

The five agents

In order of increasing cleverness:

  • random -- uniform random valid sequence. Baseline.

  • greedy -- score = gap: how far the play moves the pile top, with the backwards trick scoring -10. No memory.

  • counting -- greedy + a memory of every card already played + a tiny tiebreaker that prefers piles where my next backwards-trick complement is still alive (in my hand > unseen > dead).

  • expert -- counting + a hand-aware "leftover pain" penalty (don't strand my own worst cards), a one-turn lookahead (don't strand the next player) and an endgame DFS that, once the deck is empty, checks whether any legal order finishes the deck.

  • mcts -- the one this post is about.

Results

200 games per configuration × 4 configs (2 / 3 / 4 / 5 players) = 800 games per strategy. MCTS is the slow one at ~150 s per game, so its full 800-game run takes a few hours; the heuristic batches finish in minutes. I ran the heavier batches following the same Automate Compute on Hetzner rsync / run / rsync-back pattern, just with a uv-based job.sh instead of a Docker build.

Win rates by player count (each cell is wins out of 200 games):

strategy

2p

3p

4p

5p

total / 800

random

0.0 %

0.0 %

0.0 %

0.0 %

0 (0.0 %)

counting

5.5 %

1.5 %

3.5 %

3.0 %

27 (3.4 %)

greedy

8.0 %

3.0 %

2.0 %

2.5 %

31 (3.9 %)

expert

9.5 %

3.5 %

4.0 %

3.0 %

40 (5.0 %)

mcts

19.0 %

13.5 %

23.0 %

14.5 %

140 (17.5 %)


Random literally never wins, which confirms the engine isn't accidentally generous.

MCTS beats every heuristic at every player count -- by 3-6× the best-heuristic win rate -- and the absolute lift over expert peaks at 4 players (+19 pp). That was the opposite of what I expected: small hands plus a wider team should be the hardest case for any agent, but it's where actually simulating the future pays off the most.

How MCTS works here

The MCTS agent in this repo is flat Monte-Carlo with determinisation -- a single layer of rollouts, no tree. Per turn:

  1. Candidates. Take the top TOP_K = 5 2-play sequences by greedy gap. This is the same enumeration the other agents do; MCTS just keeps several candidates instead of committing to one.

  2. Determinise. The agent doesn't know who holds which unseen card. So for each rollout it samples a random partition of the unseen pool (cards not in my hand and not in any pile history) into the other players' hands at their public sizes, with the leftover becoming the draw deck. Each rollout uses a fresh random guess of the hidden state.

  3. Roll out. Apply my candidate sequence to a Game rebuilt from that determinisation, then play the rest of the game with the counting agent as the policy for every player. Counting was picked as the rollout policy after a quick comparison: it gave 19 / 150 wins vs greedy-rollout's 13 / 150. Cheap to run and a decent simulator of the real future.

  4. Score. 1.0 if that rollout won, otherwise cards_played / 98.

  5. Pick. For each candidate, average the score over N_ROLLOUTS = 40 rollouts. Play the candidate with the highest average. That's 200 rollouts per turn (5 × 40).

def plan(snap):
    candidates = top_k_by_greedy(snap, k=5)
    best = None
    for seq in candidates:
        total = 0.0
        for _ in range(40):
            g = determinise(snap)         # random partition of unseen
            apply_sequence(g, seq)
            playout(g, policy=counting)   # finish the game
            total += score(g)
        mean = total / 40
        if best is None or mean > best.mean:
            best = (seq, mean)
    return best.seq

The interesting bit is that you don't need a "smart" rollout policy. Counting is a one-line heuristic with a 0.001-weight tiebreaker, yet wrapped in 40 determinised rollouts per candidate it gets from expert's 5.0 % overall to MCTS's 17.5 % -- and from 4.0 % to 23 % at 4 players. Most of the lift comes from actually simulating the future under a fixed policy, instead of trying to encode that simulation as more heuristic terms in a single-turn score.

So: rule-respecting agents can win The Game -- 13--23 % depending on player count, and 97 / 98 cards is a common almost-win.

Sunrise and sunset in Waybar with astral and uv

I wanted the next sunrise and sunset times in my waybar -- partly out of curiosity, partly because I cycle a lot and want to know how much daylight I still have. The astral Python library does the math; uv runs it without me having to manage a virtualenv.

waybar fragment showing the sun module with sunset 20:46 and sunrise 05:52

The script

The interesting part is that the Python script is a PEP 723 single-file script. Its dependencies live in a comment block at the top, and the shebang tells uv to handle the rest:

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["astral"]
# ///

When you chmod +x it and run it, uv reads the metadata, builds (or reuses) a cached environment under ~/.cache/uv/environments-v2/, and executes the script in it. Nothing to install globally, no requirements.txt, no project venv to ship.

Astral has a built-in geocoder, but only one of the German cities I cared about (Berlin) is in its database. So I hardcoded a small lat/lon table for the cities I look up:

CITIES: dict[str, tuple[float, float]] = {
    "Stuttgart": (48.7758, 9.1829),
    "Karlsruhe": (49.0069, 8.4037),
    "Berlin": (52.5200, 13.4050),
    "Munich": (48.1351, 11.5820),
}

The render function picks the next upcoming sunrise and the next upcoming sunset (today's if still ahead, tomorrow's otherwise) and orders them so that whichever happens sooner is shown first:

def render(city: str, now: datetime) -> str:
    lat, lon = CITIES[city]
    loc = LocationInfo(city, "Germany", "Europe/Berlin", lat, lon)

    today = sun(loc.observer, date=now.date(), tzinfo=TZ)
    tomorrow = sun(loc.observer, date=now.date() + timedelta(days=1), tzinfo=TZ)
    rise = today["sunrise"] if now < today["sunrise"] else tomorrow["sunrise"]
    set_ = today["sunset"]  if now < today["sunset"]  else tomorrow["sunset"]

    if rise < set_:
        return f"\U0001F305 {rise:%H:%M} \U0001F307 {set_:%H:%M}"
    return f"\U0001F307 {set_:%H:%M} \U0001F305 {rise:%H:%M}"

So during daylight you see 🌇 20:48 🌅 05:50 (sunset is next, then tomorrow's sunrise), and at night or in the early morning hours you see 🌅 05:50 🌇 20:49.

Wiring it into waybar

A tiny shell wrapper makes the city configurable from the waybar config:

#!/bin/bash
exec ~/bin/sun/sun_status.py "${1:-Stuttgart}"

And in ~/.config/waybar/config:

"custom/sun": {
   "exec": "~/bin/sun/sun_status.sh",
   "interval": 3600
}

The other use case

In waybar I only ever show Stuttgart -- but in the shell I look up other cities:

$ ./sun_status.sh Munich
🌇 20:42 🌅 05:39

$ ./sun_status.sh Berlin
🌇 20:46 🌅 05:24

Useful before a cycling tour ("how late can I be on the road and still get back before dark?") or before an early start to catch a sunrise somewhere ("when do I actually have to get up?").

Tests

Because the rendering logic is a single pure function -- render(city, now) -> str -- the tests just pass a constructed datetime directly.

@pytest.mark.parametrize("city, now, expected", [
    ("Stuttgart", datetime(2026, 5, 8, 3, 0, tzinfo=TZ), "🌅 05:52 🌇 20:48"),
    ("Stuttgart", datetime(2026, 5, 8, 14, 0, tzinfo=TZ), "🌇 20:48 🌅 05:50"),
    ("Stuttgart", datetime(2026, 5, 8, 22, 0, tzinfo=TZ), "🌅 05:50 🌇 20:49"),
    # ...
])
def test_render(city, now, expected):
    assert render(city, now) == expected

A pyproject.toml next to the script declares the dev dependencies, so uv run pytest works:

[project]
name = "sun"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = ["astral"]

[dependency-groups]
dev = ["pytest"]

The script keeps its PEP 723 header, so waybar still calls it directly via the uv run --script shebang. The project venv at ./.venv/ is only used for the tests; the waybar invocation uses the cached script env under ~/.cache/uv/. Two independent environments for the same file -- both managed by uv, neither requiring me to type pip install.

Watch Arch Linux package updates with a Forgejo Action

I wanted to know when specific Arch Linux packages got bumped -- for example when emacs moves from 30.2-2 to 30.2-3. I don't want to run pacman -Syu just to find out, and the Arch website has a JSON API that makes this easy to poll.

The API call for a single package looks like this:

curl -s 'https://archlinux.org/packages/search/json/?name=emacs&repo=Extra&arch=x86_64' \
    | jq -r '.results[0] | (.pkgver + "-" + (.pkgrel|tostring))'

This prints something like 30.2-2.

I wanted a generic solution, so the list of packages to watch lives in a packages.yaml file:

packages:
  - name: emacs
    repo: Extra
    arch: x86_64

Adding another package means appending three more lines.

The Forgejo Action runs once a day, queries each entry, and compares the result to a file in state/ that stores the last seen pkgver-pkgrel. If it differs, the Action sends a ntfy message (see previous post) and commits the new state file back to the repo. That way the git log of state/ becomes the bump history.

name: Check Arch package updates

on:
  schedule:
    - cron: '0 8 * * *'  # Daily at 8:00 AM UTC
  workflow_dispatch:

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          token: ${{ secrets.GITHUB_TOKEN }}

      - name: Install tools
        run: |
          command -v jq >/dev/null || { apt-get update -qq && apt-get install -y -qq jq; }
          if ! command -v yq >/dev/null; then
            case "$(uname -m)" in
              x86_64)         arch=amd64 ;;
              aarch64|arm64)  arch=arm64 ;;
              armv7l)         arch=arm ;;
              *) echo "unsupported arch: $(uname -m)"; exit 1 ;;
            esac
            curl -fsSL -o /usr/local/bin/yq \
              "https://github.com/mikefarah/yq/releases/latest/download/yq_linux_${arch}"
            chmod +x /usr/local/bin/yq
          fi

      - name: Check packages
        env:
          NTFY_TOKEN: ${{ secrets.NTFY_TOKEN }}
        run: |
          set -euo pipefail
          mkdir -p state

          yq -o=json '.packages' packages.yaml | jq -c '.[]' | while read -r pkg; do
            name=$(printf '%s' "$pkg" | jq -r '.name')
            repo=$(printf '%s' "$pkg" | jq -r '.repo')
            arch=$(printf '%s' "$pkg" | jq -r '.arch')

            url="https://archlinux.org/packages/search/json/?name=${name}&repo=${repo}&arch=${arch}"
            current=$(curl -sf "$url" \
              | jq -r '.results[0] | if . == null then empty else (.pkgver + "-" + (.pkgrel|tostring)) end')

            if [ -z "${current:-}" ]; then
              echo "::warning::no result for ${name} (${repo}/${arch})"
              continue
            fi

            state_file="state/${repo}_${arch}_${name}.txt"
            previous=$(cat "$state_file" 2>/dev/null || true)

            if [ "$previous" != "$current" ]; then
              printf '%s\n' "$current" > "$state_file"
              if [ -n "$previous" ]; then
                curl -fsS \
                     -H "Authorization: Bearer ${NTFY_TOKEN}" \
                     -H "Title: Arch package updated: ${name}" \
                     -H "Priority: default" \
                     -H "Tags: package" \
                     -d "${repo}/${name} (${arch}): ${previous} -> ${current}" \
                     https://ntfy.madflex.de/forgejo
              else
                echo "initial state for ${name}, not notifying"
              fi
            fi
          done

      - name: Commit and push state changes
        run: |
          git config user.name "Automated"
          git config user.email "actions@forgejo.local"
          git remote set-url origin https://oauth2:${{ secrets.GITHUB_TOKEN }}@forgejo.tail07efb.ts.net/${{ github.repository }}
          git add state/
          git diff --staged --quiet || git commit -m "Update package state - $(date +'%Y-%m-%d')"
          git push

One note about the state:

First-seen packages are recorded silently: when the state file does not exist yet, the current version is written but no ntfy is sent. That avoids a flood of notifications after adding a batch of new packages to packages.yaml.

Now a few days after I wrote the post, the Emacs package was updated and the notification looks like this:

img1

I am monitoring several other packages as well, but the problems with Emacs and Tree-sitter were the starting point.