Do Language Models Understand the Cognitive Tasks Given to Them? Investigations with the N-Back Paradigm
Xiaoyang Hu, Richard L. Lewis

TL;DR
This paper investigates whether language models truly understand cognitive tasks like the N-back paradigm, revealing that poor performance often stems from task comprehension issues rather than cognitive limitations.
Contribution
It provides a detailed analysis of language models' performance on N-back tasks, highlighting the importance of task understanding and proposing refined evaluation methodologies.
Findings
Performance declines are partly due to task comprehension issues.
Alternative prompting strategies can improve model performance.
Model attention analysis offers insights into understanding limitations.
Abstract
Cognitive tasks originally developed for humans are now increasingly used to study language models. While applying these tasks is often straightforward, interpreting their results can be challenging. In particular, when a model underperforms, it is often unclear whether this results from a limitation in the cognitive ability being tested or a failure to understand the task itself. A recent study argues that GPT 3.5's declining performance on 2-back and 3-back tasks reflects a working memory capacity limit similar to humans (Gong et al., 2024). By analyzing a range of open-source language models of varying performance levels on these tasks, we show that the poor performance is due at least in part to a limitation in task comprehension and task set maintenance. We challenge the best-performing model with progressively harder versions of the task (up to 10-back) and experiment with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Language and cultural evolution
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Residual Connection · Adam · Weight Decay · Linear Warmup With Cosine Annealing · Multi-Head Attention · Layer Normalization
