TopoBench: Benchmarking LLMs on Hard Topological Reasoning

Mayug Maniparambil; Nils Hoehing; Janak Kapuriya; Arjun Karuvally; Ellen Rushe; Anthony Ventresque; Noel O'Connor; Fergal Reid

arXiv:2603.12133·cs.AI·March 13, 2026

TopoBench: Benchmarking LLMs on Hard Topological Reasoning

Mayug Maniparambil, Nils Hoehing, Janak Kapuriya, Arjun Karuvally, Ellen Rushe, Anthony Ventresque, Noel O'Connor, Fergal Reid

PDF

Open Access

TL;DR

This paper introduces TopoBench, a challenging benchmark for evaluating large language models on complex topological reasoning tasks, revealing significant limitations and exploring potential mitigation strategies.

Contribution

The paper presents TopoBench, a novel benchmark with detailed error analysis and interventions to understand and improve LLMs' topological reasoning capabilities.

Findings

01

Frontier models solve fewer than 25% of hard instances.

02

Certain error patterns like premature commitment hinder problem solving.

03

Mitigation strategies improve constraint extraction but not reasoning.

Abstract

Solving topological grid puzzles requires reasoning over global spatial invariants such as connectivity, loop closure, and region symmetry and remains challenging for even the most powerful large language models (LLMs). To study these abilities under controlled settings, we introduce TopoBench, a benchmark of six puzzle families across three difficulty levels. We evaluate strong reasoning LLMs on TopoBench and find that even frontier models solve fewer than one quarter of hard instances, with two families nearly unsolved. To investigate whether these failures stem from reasoning limitations or from difficulty extracting and maintaining spatial constraints, we annotate 750 chain of thought traces with an error taxonomy that surfaces four candidate causal failure modes, then test them with targeted interventions simulating each error type. These interventions show that certain error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Topic Modeling · Natural Language Processing Techniques