Spatial Competence Benchmark

Jash Vira; Ashley Harris

arXiv:2604.09594·cs.AI·April 14, 2026

Spatial Competence Benchmark

Jash Vira, Ashley Harris

PDF

1 Repo

TL;DR

The paper introduces SCBench, a comprehensive benchmark for evaluating spatial competence in large models across hierarchical tasks, revealing accuracy limitations and failure modes.

Contribution

It presents a new hierarchical spatial competence benchmark with task generators, verifiers, and visualization tools, addressing limitations of existing spatial evaluations.

Findings

01

Frontier models show decreasing accuracy up the capability ladder.

02

Accuracy gains are concentrated at low output-token budgets.

03

Failures often involve locally plausible geometry breaking global constraints.

Abstract

Spatial competence is the quality of maintaining a consistent internal representation of an environment and using it to infer discrete structure and plan actions under constraints. Prevailing spatial evaluations for large models are limited to probing isolated primitives through 3D transformations or visual question answering. We introduce the Spatial Competence Benchmark (SCBench), spanning three hierarchical capability buckets whose tasks require executable outputs verified by deterministic checkers or simulator-based evaluators. On SCBench, three frontier models exhibit monotonically decreasing accuracy up the capability ladder. Sweeping output-token caps shows that accuracy gains concentrate at low budgets and saturate quickly, and failures are dominated by locally plausible geometry that breaks global constraints. We release the task generators, verifiers, and visualisation tooling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ashleyharris-maptek-com-au/SpatialCompetenceBenchmark
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.