Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited

Anthony G Cohn; Robert E Blackwell

arXiv:2507.12059·cs.CL·November 11, 2025

Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited

Anthony G Cohn, Robert E Blackwell

PDF

Open Access

TL;DR

This paper evaluates 28 large language models' ability to reason about cardinal directions using a comprehensive benchmark, revealing that even advanced models struggle with reliable reasoning in this domain.

Contribution

It introduces a new benchmark for testing LLMs on reasoning about cardinal directions, extending previous work and highlighting current limitations.

Findings

01

Most LLMs fail to reliably determine correct cardinal directions.

02

Even recent large reasoning models show significant reasoning errors.

03

The benchmark reveals persistent challenges in spatial reasoning for LLMs.

Abstract

We investigate the abilities of 28 Large language Models (LLMs) to reason about cardinal directions (CDs) using a benchmark generated from a set of templates, extensively testing an LLM's ability to determine the correct CD given a particular scenario. The templates allow for a number of degrees of variation such as means of locomotion of the agent involved, and whether set in the first, second or third person. Even the newer Large Reasoning Models are unable to reliably determine the correct CD for all questions. This paper summarises and extends earlier work presented at COSIT-24.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques