TopoBench: Benchmarking LLMs on Hard Topological Reasoning
Mayug Maniparambil, Nils Hoehing, Janak Kapuriya, Arjun Karuvally, Ellen Rushe, Anthony Ventresque, Noel O'Connor, Fergal Reid

TL;DR
This paper introduces TopoBench, a challenging benchmark for evaluating large language models on complex topological reasoning tasks, revealing significant limitations and exploring potential mitigation strategies.
Contribution
The paper presents TopoBench, a novel benchmark with detailed error analysis and interventions to understand and improve LLMs' topological reasoning capabilities.
Findings
Frontier models solve fewer than 25% of hard instances.
Certain error patterns like premature commitment hinder problem solving.
Mitigation strategies improve constraint extraction but not reasoning.
Abstract
Solving topological grid puzzles requires reasoning over global spatial invariants such as connectivity, loop closure, and region symmetry and remains challenging for even the most powerful large language models (LLMs). To study these abilities under controlled settings, we introduce TopoBench, a benchmark of six puzzle families across three difficulty levels. We evaluate strong reasoning LLMs on TopoBench and find that even frontier models solve fewer than one quarter of hard instances, with two families nearly unsolved. To investigate whether these failures stem from reasoning limitations or from difficulty extracting and maintaining spatial constraints, we annotate 750 chain of thought traces with an error taxonomy that surfaces four candidate causal failure modes, then test them with targeted interventions simulating each error type. These interventions show that certain error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Topic Modeling · Natural Language Processing Techniques
