Do Language Models Understand the Cognitive Tasks Given to Them? Investigations with the N-Back Paradigm

Xiaoyang Hu; Richard L. Lewis

arXiv:2412.18120·cs.CL·June 3, 2025

Do Language Models Understand the Cognitive Tasks Given to Them? Investigations with the N-Back Paradigm

Xiaoyang Hu, Richard L. Lewis

PDF

Open Access 1 Video

TL;DR

This paper investigates whether language models truly understand cognitive tasks like the N-back paradigm, revealing that poor performance often stems from task comprehension issues rather than cognitive limitations.

Contribution

It provides a detailed analysis of language models' performance on N-back tasks, highlighting the importance of task understanding and proposing refined evaluation methodologies.

Findings

01

Performance declines are partly due to task comprehension issues.

02

Alternative prompting strategies can improve model performance.

03

Model attention analysis offers insights into understanding limitations.

Abstract

Cognitive tasks originally developed for humans are now increasingly used to study language models. While applying these tasks is often straightforward, interpreting their results can be challenging. In particular, when a model underperforms, it is often unclear whether this results from a limitation in the cognitive ability being tested or a failure to understand the task itself. A recent study argues that GPT 3.5's declining performance on 2-back and 3-back tasks reflects a working memory capacity limit similar to humans (Gong et al., 2024). By analyzing a range of open-source language models of varying performance levels on these tasks, we show that the poor performance is due at least in part to a limitation in task comprehension and task set maintenance. We challenge the best-performing model with progressively harder versions of the task (up to 10-back) and experiment with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Do Language Models Understand the Cognitive Tasks Given to Them? Investigations with the N-Back Paradigm· underline

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Language and cultural evolution

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Residual Connection · Adam · Weight Decay · Linear Warmup With Cosine Annealing · Multi-Head Attention · Layer Normalization