Can agents win "The Game"? -- five strategies compared
The Game by Steffen Benndorf is a cooperative card game: 98 numbered cards (2--99), four piles (two going up from 1, two going down from 100), and the team wins when every card lands on a pile. The twist is that you may never tell teammates the exact number on a card you hold. You can hint -- "please don't play on the third pile" -- but no numbers.
I wanted to know whether agents constrained to the same information a human player has (their own hand, every pile's history) can actually finish all 98 cards. So I built a small Python engine for the game and five agents -- random, greedy, counting, expert and an MCTS-style flat Monte-Carlo agent -- and ran a few thousand games.
The setup
Each agent is a single function plan(snap) -> plays.
snap is the info-set view (my hand, public pile tops + histories, public hand sizes, draw count) -- the same information a human gets across the table.
Each agent picks an entire turn-sequence of plays at once: the planner enumerates every legal 2-card sequence the hand can make this turn, scores each summed sequence, and picks the lowest.
This single trick of sequence-aware planning is worth a lot: it lets the agent set up a backwards trick (card == top ∓ 10) within its own turn -- e.g. play 60 on an UP pile, then play 50 as a -10 trick -- instead of waiting for the trick to happen by chance.
The five agents
In order of increasing cleverness:
random -- uniform random valid sequence. Baseline.
greedy --
score = gap: how far the play moves the pile top, with the backwards trick scoring -10. No memory.counting -- greedy + a memory of every card already played + a tiny tiebreaker that prefers piles where my next backwards-trick complement is still alive (in my hand > unseen > dead).
expert -- counting + a hand-aware "leftover pain" penalty (don't strand my own worst cards), a one-turn lookahead (don't strand the next player) and an endgame DFS that, once the deck is empty, checks whether any legal order finishes the deck.
mcts -- the one this post is about.
Results
200 games per configuration × 4 configs (2 / 3 / 4 / 5 players) = 800 games per strategy.
MCTS is the slow one at ~150 s per game, so its full 800-game run takes a few hours; the heuristic batches finish in minutes.
I ran the heavier batches following the same Automate Compute on Hetzner rsync / run / rsync-back pattern, just with a uv-based job.sh instead of a Docker build.
Win rates by player count (each cell is wins out of 200 games):
strategy |
2p |
3p |
4p |
5p |
total / 800 |
|---|---|---|---|---|---|
random |
0.0 % |
0.0 % |
0.0 % |
0.0 % |
0 (0.0 %) |
counting |
5.5 % |
1.5 % |
3.5 % |
3.0 % |
27 (3.4 %) |
greedy |
8.0 % |
3.0 % |
2.0 % |
2.5 % |
31 (3.9 %) |
expert |
9.5 % |
3.5 % |
4.0 % |
3.0 % |
40 (5.0 %) |
mcts |
19.0 % |
13.5 % |
23.0 % |
14.5 % |
140 (17.5 %) |
MCTS beats every heuristic at every player count -- by 3-6× the best-heuristic win rate -- and the absolute lift over expert peaks at 4 players (+19 pp). That was the opposite of what I expected: small hands plus a wider team should be the hardest case for any agent, but it's where actually simulating the future pays off the most.
How MCTS works here
The MCTS agent in this repo is flat Monte-Carlo with determinisation -- a single layer of rollouts, no tree. Per turn:
Candidates. Take the top
TOP_K = 52-play sequences by greedy gap. This is the same enumeration the other agents do; MCTS just keeps several candidates instead of committing to one.Determinise. The agent doesn't know who holds which unseen card. So for each rollout it samples a random partition of the unseen pool (cards not in my hand and not in any pile history) into the other players' hands at their public sizes, with the leftover becoming the draw deck. Each rollout uses a fresh random guess of the hidden state.
Roll out. Apply my candidate sequence to a
Gamerebuilt from that determinisation, then play the rest of the game with the counting agent as the policy for every player. Counting was picked as the rollout policy after a quick comparison: it gave 19 / 150 wins vs greedy-rollout's 13 / 150. Cheap to run and a decent simulator of the real future.Score.
1.0if that rollout won, otherwisecards_played / 98.Pick. For each candidate, average the score over
N_ROLLOUTS = 40rollouts. Play the candidate with the highest average. That's 200 rollouts per turn (5 × 40).
def plan(snap): candidates = top_k_by_greedy(snap, k=5) best = None for seq in candidates: total = 0.0 for _ in range(40): g = determinise(snap) # random partition of unseen apply_sequence(g, seq) playout(g, policy=counting) # finish the game total += score(g) mean = total / 40 if best is None or mean > best.mean: best = (seq, mean) return best.seq
The interesting bit is that you don't need a "smart" rollout policy. Counting is a one-line heuristic with a 0.001-weight tiebreaker, yet wrapped in 40 determinised rollouts per candidate it gets from expert's 5.0 % overall to MCTS's 17.5 % -- and from 4.0 % to 23 % at 4 players. Most of the lift comes from actually simulating the future under a fixed policy, instead of trying to encode that simulation as more heuristic terms in a single-turn score.
So: rule-respecting agents can win The Game -- 13--23 % depending on player count, and 97 / 98 cards is a common almost-win.

