Assessing Consciousness-Related Behaviors in Large Language Models Using the Maze Test

Rui A. Pimenta; Tim Schlippe; Kristina Schaaff

arXiv:2508.16705·cs.CL·August 26, 2025

Assessing Consciousness-Related Behaviors in Large Language Models Using the Maze Test

Rui A. Pimenta, Tim Schlippe, Kristina Schaaff

PDF

TL;DR

This study evaluates consciousness-like behaviors in large language models using a maze navigation test, revealing progress in reasoning but persistent gaps in self-awareness and coherence.

Contribution

Introduces the Maze Test to assess consciousness-related behaviors in LLMs and benchmarks 12 models, highlighting their reasoning strengths and self-awareness limitations.

Findings

01

Reasoning-capable LLMs outperform standard models.

02

Gemini 2.0 Pro achieves 52.9% complete path accuracy.

03

Models struggle with maintaining coherent self-models.

Abstract

We investigate consciousness-like behaviors in Large Language Models (LLMs) using the Maze Test, challenging models to navigate mazes from a first-person perspective. This test simultaneously probes spatial awareness, perspective-taking, goal-directed behavior, and temporal sequencing-key consciousness-associated characteristics. After synthesizing consciousness theories into 13 essential characteristics, we evaluated 12 leading LLMs across zero-shot, one-shot, and few-shot learning scenarios. Results showed reasoning-capable LLMs consistently outperforming standard versions, with Gemini 2.0 Pro achieving 52.9% Complete Path Accuracy and DeepSeek-R1 reaching 80.5% Partial Path Accuracy. The gap between these metrics indicates LLMs struggle to maintain coherent self-models throughout solutions -- a fundamental consciousness aspect. While LLMs show progress in consciousness-related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.