Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers

Miit Daga; Swarna Priya Ramu

arXiv:2604.11508·cs.LG·April 17, 2026

Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers

Miit Daga, Swarna Priya Ramu

PDF

TL;DR

This study investigates how different neural network architectures forget training samples during fine-tuning, revealing architecture-dependent, stochastic, and class-related patterns with implications for model ensemble and data management.

Contribution

It provides a detailed analysis of sample forgetting dynamics across architectures, highlighting the non-intrinsic nature of sample difficulty and the limitations of static curriculum methods.

Findings

01

ResNet-18 and DeiT-Small forget different samples with low overlap.

02

ViT exhibits more structured forgetting than CNNs.

03

Sample forgetting is highly stochastic across different training runs.

Abstract

Fine-tuning pretrained image classifiers is standard practice, yet which individual samples are forgotten during this process, and whether forgetting patterns are stable or architecture dependent, remains unclear. Understanding these dynamics has direct implications for curriculum design, data pruning, and ensemble construction. We track per-sample correctness at every epoch during fine-tuning of ResNet-18 and DeiT-Small on a retinal OCT dataset (7 classes, 56:1 imbalance) and CUB-200-2011 (200 bird species), fitting Ebbinghaus-style exponential decay curves to each sample's retention trace. Five findings emerge. First, the two architectures forget fundamentally different samples: Jaccard overlap of the top 10 percent most-forgotten is 0.34 on OCTDL and 0.15 on CUB-200. Second, ViT forgetting is more structured (mean $R^{2} = 0.74$ ) than CNN forgetting ( $R^{2} = 0.52$ ). Third, per-sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.