Judgment of Learning: A Human Ability Beyond Generative Artificial Intelligence
Markus Huff, Elanur Ulak\c{c}{\i}

TL;DR
This study compares human and LLM judgments of learning, revealing that while humans accurately predict their memory performance, current LLMs like GPT models do not exhibit similar metacognitive abilities, highlighting a key limitation.
Contribution
The paper introduces a cross-agent prediction model to evaluate LLMs' metacognitive predictions of memory, demonstrating their current inability to match human judgment accuracy.
Findings
Humans reliably predict their memory performance.
LLMs fail to accurately predict their own memory performance.
Metacognitive abilities in LLMs are limited compared to humans.
Abstract
Large language models (LLMs) increasingly mimic human cognition in various language-based tasks. However, their capacity for metacognition - particularly in predicting memory performance - remains unexplored. Here, we introduce a cross-agent prediction model to assess whether ChatGPT-based LLMs align with human judgments of learning (JOL), a metacognitive measure where individuals predict their own future memory performance. We tested humans and LLMs on pairs of sentences, one of which was a garden-path sentence - a sentence that initially misleads the reader toward an incorrect interpretation before requiring reanalysis. By manipulating contextual fit (fitting vs. unfitting sentences), we probed how intrinsic cues (i.e., relatedness) affect both LLM and human JOL. Our results revealed that while human JOL reliably predicted actual memory performance, none of the tested LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Mapping
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · ALIGN · Cosine Annealing · Linear Layer · Multi-Head Attention · Dropout · Layer Normalization · Linear Warmup With Cosine Annealing · Adam · Attention Dropout
