Assessing Consciousness-Related Behaviors in Large Language Models Using the Maze Test
Rui A. Pimenta, Tim Schlippe, Kristina Schaaff

TL;DR
This study evaluates consciousness-like behaviors in large language models using a maze navigation test, revealing progress in reasoning but persistent gaps in self-awareness and coherence.
Contribution
Introduces the Maze Test to assess consciousness-related behaviors in LLMs and benchmarks 12 models, highlighting their reasoning strengths and self-awareness limitations.
Findings
Reasoning-capable LLMs outperform standard models.
Gemini 2.0 Pro achieves 52.9% complete path accuracy.
Models struggle with maintaining coherent self-models.
Abstract
We investigate consciousness-like behaviors in Large Language Models (LLMs) using the Maze Test, challenging models to navigate mazes from a first-person perspective. This test simultaneously probes spatial awareness, perspective-taking, goal-directed behavior, and temporal sequencing-key consciousness-associated characteristics. After synthesizing consciousness theories into 13 essential characteristics, we evaluated 12 leading LLMs across zero-shot, one-shot, and few-shot learning scenarios. Results showed reasoning-capable LLMs consistently outperforming standard versions, with Gemini 2.0 Pro achieving 52.9% Complete Path Accuracy and DeepSeek-R1 reaching 80.5% Partial Path Accuracy. The gap between these metrics indicates LLMs struggle to maintain coherent self-models throughout solutions -- a fundamental consciousness aspect. While LLMs show progress in consciousness-related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
