Large Language Models Are Human-Like Internally
Tatsuki Kuribayashi, Yohei Oseki, Souhaib Ben Taieb, Kentaro Inui, Timothy Baldwin

TL;DR
This study shows that larger language models, when analyzed through their internal layers, align with human reading and neurophysiological data better than previously thought, challenging earlier claims of their cognitive implausibility.
Contribution
The paper demonstrates that internal layer analysis of large LMs reveals strong alignment with human cognition, overturning prior focus on final layers and suggesting larger models are more human-like.
Findings
Internal layers of large LMs match human reading data well.
Earlier layers correlate with fast gaze durations.
Later layers align with N400 brain potentials.
Abstract
Recent cognitive modeling studies have reported that larger language models (LMs) exhibit a poorer fit to human reading behavior (Oh and Schuler, 2023b; Shain et al., 2024; Kuribayashi et al., 2024), leading to claims of their cognitive implausibility. In this paper, we revisit this argument through the lens of mechanistic interpretability and argue that prior conclusions were skewed by an exclusive focus on the final layers of LMs. Our analysis reveals that next-word probabilities derived from internal layers of larger LMs align with human sentence processing data as well as, or better than, those from smaller LMs. This alignment holds consistently across behavioral (self-paced reading times, gaze durations, MAZE task processing times) and neurophysiological (N400 brain potentials) measures, challenging earlier mixed results and suggesting that the cognitive plausibility of larger LMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
