Evaluating the Ability of Large Language Models to Reason about Cardinal   Directions

Anthony G Cohn; Robert E Blackwell

arXiv:2406.16528·cs.CL·September 11, 2024·3 cites

Evaluating the Ability of Large Language Models to Reason about Cardinal Directions

Anthony G Cohn, Robert E Blackwell

PDF

Open Access

TL;DR

This study evaluates large language models' ability to reason about cardinal directions, revealing they struggle with complex scenarios despite performing well on simpler recall tasks.

Contribution

The paper introduces two datasets to test LLMs' reasoning about cardinal directions, highlighting their limitations in complex reasoning tasks.

Findings

01

LLMs perform well on recall-based tasks.

02

LLMs struggle with complex reasoning scenarios.

03

Temperature setting of zero does not improve performance.

Abstract

We investigate the abilities of a representative set of Large language Models (LLMs) to reason about cardinal directions (CDs). To do so, we create two datasets: the first, co-created with ChatGPT, focuses largely on recall of world knowledge about CDs; the second is generated from a set of templates, comprehensively testing an LLM's ability to determine the correct CD given a particular scenario. The templates allow for a number of degrees of variation such as means of locomotion of the agent involved, and whether set in the first , second or third person. Even with a temperature setting of zero, Our experiments show that although LLMs are able to perform well in the simpler dataset, in the second more complex dataset no LLM is able to reliably determine the correct CD, even with a temperature setting of zero.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training