Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps

Vasileios Sevetlidis; George Pavlidis

arXiv:2601.21624·cs.LG·January 30, 2026

Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps

Vasileios Sevetlidis, George Pavlidis

PDF

Open Access

TL;DR

This paper surveys the mechanisms behind memory effects in deep neural network training, introduces new causal estimands and measurement protocols, and emphasizes the importance of understanding training history influence.

Contribution

It provides a comprehensive organization of training memory mechanisms, introduces novel causal estimands and perturbation primitives, and proposes a protocol for measuring training history effects.

Findings

01

Training memory effects depend on optimizer states, data order, and auxiliary states.

02

Introduces seed-paired causal estimands and perturbation primitives for analysis.

03

Proposes a reporting checklist and protocol for measuring training history influence.

Abstract

Modern deep-learning training is not memoryless. Updates depend on optimizer moments and averaging, data-order policies (random reshuffling vs with-replacement, staged augmentations and replay), the nonconvex path, and auxiliary state (teacher EMA/SWA, contrastive queues, BatchNorm statistics). This survey organizes mechanisms by source, lifetime, and visibility. It introduces seed-paired, function-space causal estimands; portable perturbation primitives (carry/reset of momentum/Adam/EMA/BN, order-window swaps, queue/teacher tweaks); and a reporting checklist with audit artifacts (order hashes, buffer/BN checksums, RNG contracts). The conclusion is a protocol for portable, causal, uncertainty-aware measurement that attributes how much training history matters across models, data, and regimes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning