Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory
Ankur Sikarwar, Mengmi Zhang

TL;DR
This paper introduces a comprehensive benchmark dataset for working memory, comparing AI models and humans across multiple tasks, revealing AI's partial mimicry of human WM and highlighting areas for improvement.
Contribution
The paper presents WorM, a large-scale, multifaceted benchmark dataset for working memory, and evaluates AI models against human benchmarks across diverse WM functionalities.
Findings
AI models replicate primacy and recency effects
Models show neural specialization for WM domains
Limitations in AI models' ability to fully emulate human WM
Abstract
Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCognitive Functions and Memory · Ferroelectric and Negative Capacitance Devices · Neural and Behavioral Psychology Studies
