Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
Ida Momennejad, Hosein Hasanbeig, Felipe Vieira, Hiteshi Sharma,, Robert Osazuwa Ness, Nebojsa Jojic, Hamid Palangi, Jonathan Larson

TL;DR
This paper introduces CogEval, a systematic protocol inspired by cognitive science to evaluate cognitive abilities in large language models, revealing significant limitations in their planning and understanding of cognitive maps.
Contribution
The paper presents CogEval, a new evaluation protocol, and applies it to systematically assess planning and cognitive map abilities across eight major LLMs, uncovering notable failures.
Findings
LLMs struggle with complex planning tasks and hallucinate invalid trajectories.
Systematic evaluation shows LLMs lack robust planning abilities.
Failures suggest LLMs do not understand underlying relational structures.
Abstract
Recently an influx of studies claim emergent cognitive abilities in large language models (LLMs). Yet, most rely on anecdotes, overlook contamination of training sets, or lack systematic Evaluation involving multiple tasks, control conditions, multiple iterations, and statistical robustness tests. Here we make two major contributions. First, we propose CogEval, a cognitive science-inspired protocol for the systematic evaluation of cognitive capacities in Large Language Models. The CogEval protocol can be followed for the evaluation of various abilities. Second, here we follow CogEval to systematically evaluate cognitive maps and planning ability across eight LLMs (OpenAI GPT-4, GPT-3.5-turbo-175B, davinci-003-175B, Google Bard, Cohere-xlarge-52.4B, Anthropic Claude-1-52B, LLaMA-13B, and Alpaca-7B). We base our task prompts on human experiments, which offer both established construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI)
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings · Label Smoothing · Adam · Cosine Annealing
