WMF-AM: Probing LLM Working Memory via Depth-Parameterized Cumulative State Tracking

Dengzhe Hou; Lingyu Jiang; Deng Li; Zirui Li; Fangzhou Lin; Kazunori D Yamada

arXiv:2603.27343·cs.AI·May 5, 2026

WMF-AM: Probing LLM Working Memory via Depth-Parameterized Cumulative State Tracking

Dengzhe Hou, Lingyu Jiang, Deng Li, Zirui Li, Fangzhou Lin, Kazunori D Yamada

PDF

1 Repo

TL;DR

This paper introduces WMF-AM, a diagnostic tool for evaluating large language models' ability to maintain and update intermediate states across multiple steps, helping to understand their working memory capabilities.

Contribution

The authors present WMF-AM, a novel, adaptable benchmark that isolates cumulative load in LLMs, enabling detailed analysis of their working memory performance.

Findings

01

28 models tested with arithmetic accumulation reveal varying working memory capabilities.

02

Non-arithmetic tasks confirm the generality of the cumulative load challenge.

03

Ablation studies show cumulative load, not arithmetic skill, affects difficulty.

Abstract

Existing large language models (LLMs) evaluations use fixed-difficulty benchmarks that cannot adapt as models improve, and rarely isolate specific cognitive processes. We introduce Working Memory Fidelity-Active Manipulation (WMF-AM), a probe of cumulative state tracking, the ability to maintain and update intermediate results across K sequential operations within a single query, without a scratchpad. Unlike multi-step agent benchmarks that stress task orchestration, WMF-AM isolates within-pass cumulative load by parameterizing depth K. The core probe uses arithmetic accumulation on 28 models from 12 families (0.5B to frontier); a matched non-arithmetic extension (permissions, schedules, inventories) confirms the design generalizes beyond arithmetic. Three construct-isolation ablations confirm that cumulative load, not arithmetic skill or entity tracking, drives difficulty. We release…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dengzhe-hou/WMF-AM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.