CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
Amirreza Mohseni, Mona Mohammadi, Morteza Saghafian, Naser Talebizadeh Sardari

TL;DR
CurveBench is a new benchmark dataset for hierarchical topological reasoning over nested Jordan curves, highlighting the difficulty of exact topological understanding in visual models.
Contribution
The paper introduces CurveBench, a comprehensive dataset and task for structured prediction of containment trees from images of Jordan curves, and evaluates current models' performance.
Findings
Strongest model achieves only 71.1% accuracy on easy and 19.1% on hard configurations.
Fine-tuning vision-language models significantly improves accuracy, surpassing some large language models.
There remains a large gap in exact topological reasoning capabilities of current models.
Abstract
We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of \textbf{756 images} of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only \textbf{71.1\%} tree-generation accuracy on CurveBench-Easy and \textbf{19.1\%} on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
