$A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

Jian Zhang; Yu He; Zhiyuan Wang; Zhangqi Wang; Kai He; Fangzhi Xu; Qika Lin; Jun Liu

arXiv:2601.09274·cs.AI·January 15, 2026

$A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

Jian Zhang, Yu He, Zhiyuan Wang, Zhangqi Wang, Kai He, Fangzhi Xu, Qika Lin, Jun Liu

PDF

Open Access 1 Datasets

TL;DR

This paper introduces $A^3$-Bench, a new benchmark for evaluating scientific reasoning that emphasizes memory-driven mechanisms like anchor and attractor activation, filling a gap in existing evaluation methods.

Contribution

It proposes a novel benchmark and evaluation framework focusing on memory activation in scientific reasoning, including annotations, a dual-scale memory evaluation, and a new metric.

Findings

01

Memory activation correlates with reasoning performance.

02

The benchmark enables analysis of memory-driven reasoning mechanisms.

03

Different models show varied memory utilization patterns.

Abstract

Scientific reasoning relies not only on logical inference but also on activating prior knowledge and experiential structures. Memory can efficiently reuse knowledge and enhance reasoning consistency and stability. However, existing benchmarks mainly evaluate final answers or step-by-step coherence, overlooking the \textit{memory-driven} mechanisms that underlie human reasoning, which involves activating anchors and attractors, then integrating them into multi-step inference. To address this gap, we propose $A^{3}$ -Bench~ https://a3-bench.github.io, a benchmark designed to evaluate scientific reasoning through dual-scale memory-driven activation, grounded in Anchor and Attractor Activation. First, we annotate 2,198 science reasoning problems across domains using the SAPM process(subject, anchor & attractor, problem, and memory developing). Second, we introduce a dual-scale memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Pekku/A3-Bench
dataset· 146 dl
146 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning