Working Memory Capacity of ChatGPT: An Empirical Study
Dongyu Gong, Xingchen Wan, Dingmin Wang

TL;DR
This study empirically evaluates ChatGPT's working memory capacity using n-back tasks, revealing human-like limits and the influence of instruction strategies, and suggests benchmarking approaches for AI memory assessment.
Contribution
It provides the first systematic empirical assessment of ChatGPT's working memory capacity and explores how instruction strategies affect its performance.
Findings
ChatGPT's working memory limit is similar to humans.
Instruction strategies influence performance but do not alter capacity limits.
n-back tasks can serve as benchmarks for AI working memory.
Abstract
Working memory is a critical aspect of both human intelligence and artificial intelligence, serving as a workspace for the temporary storage and manipulation of information. In this paper, we systematically assess the working memory capacity of ChatGPT, a large language model developed by OpenAI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experiments reveal that ChatGPT has a working memory capacity limit strikingly similar to that of humans. Furthermore, we investigate the impact of different instruction strategies on ChatGPT's performance and observe that the fundamental patterns of a capacity limit persist. From our empirical findings, we propose that n-back tasks may serve as tools for benchmarking the working memory capacity of large language models and hold potential for informing future efforts aimed at enhancing AI working…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling · Explainable Artificial Intelligence (XAI)
