Unforgettable Lessons from Forgettable Images: Intra-Class Memorability Matters in Computer Vision
Jie Jing, Yongjian Huang, Serena J.-W. Wang, Shuangpeng Han, Lucia Schiatti, Yen-Ling Kuo, Qing Lin, Mengmi Zhang

TL;DR
This paper introduces intra-class memorability, a new concept and metric for understanding why some images within the same category are more memorable, and demonstrates its applications in AI models and image editing.
Contribution
It proposes the ICMscore metric, curates the ICMD dataset, and explores how intra-class memorability affects AI performance and image manipulation.
Findings
High-ICMscore images impair AI recognition and learning.
Low-ICMscore images enhance AI performance.
Diffusion models can manipulate image memorability.
Abstract
We introduce intra-class memorability, where certain images within the same class are more memorable than others despite shared category characteristics. To investigate what features make one object instance more memorable than others, we design and conduct human behavior experiments, where participants are shown a series of images, and they must identify when the current image matches the image presented a few steps back in the sequence. To quantify memorability, we propose the Intra-Class Memorability score (ICMscore), a novel metric that incorporates the temporal intervals between repeated image presentations into its calculation. Furthermore, we curate the Intra-Class Memorability Dataset (ICMD), comprising over 5,000 images across ten object classes with their ICMscores derived from 2,000 participants' responses. Subsequently, we demonstrate the usefulness of ICMD by training AI…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The introduction of intra-class memorability as a measurable and distinct property from traditional inter-class memorability is conceptually innovative and well-motivated. - The work successfully isolates intrinsic visual factors by controlling for class-level confounds. - The ICMD dataset is significantly larger (10 classes, 5,000 images, 2,000 participants) than in prior iterations, providing sufficient diversity and statistical reliability. - Clear justification is provided for interval sel
- Despite broad task coverage, the experiments rely mainly on ResNet and ViT architectures. Including comparisons with recent high-capacity models (e.g., CLIP, ConvNeXt, or vision-language transformers) could strengthen generality claims. - The finding that high-memorability images impair model learning is compelling but still somewhat descriptive. A deeper causal or representational analysis (e.g., feature redundancy, overfitting to saliency) would improve theoretical grounding. - The paper bri
The move from category-level (as in MemCat) to intra-class memorability is a significant conceptual advance, isolating perceptual rather than semantic variability. The one-category-per-session design eliminates inter-class bias, allowing finer-grained analysis of what makes specific instances memorable. The findings link human memorability to machine learning performance (e.g., continual learning, image recognition), demonstrating cross-domain relevance. Using generative models to control memora
Since the recognition task is restricted to single-category sequences, participants likely adapt to the semantic scope over time. This adaptation can lead to reduced discriminative load participants may shift from visual to semantic strategies, artificially inflating performance for later trials. Moreover, as shown in [1] (despite the problem statement being different) temporal sensitivity to stimulus change decays when subjects engage with visually or semantically diverse stimuli. In the prese
1. The paper conducts a thorough human evaluation and constructs an image memorability dataset, which may have positive implications for fields such as human-computer interaction. 2. The writing is clear and well-structured, and the experimental evaluation is comprehensive.
1. The paper’s central claim that highly memorable (HM) images impair model training is already a well-established consensus in the machine learning community. 2. The experiments rely on an insufficient amount of data, which undermines their persuasiveness. Moreover, for some experimental results, it remains unclear what practical relevance or real-world applicability they possess.
- What makes an image memorable to a human person is a complex question. The steps taken by the authors over the previous work - (i) studying that property separately for each class, and (ii) adding a temporal dimension to it - are reasonable choices to make memorability more grounded. - The scores are obtained over 2000 human subjects, which seems like a big enough sample size. - The authors have empirically demonstrated that their proposed score is better suited to filter out the dataset for
- The biggest limitation of this work is its ambiguous use case. There are two parts of the paper. The first concerns with what kinds of images are more memorable for humans, and whether there are any patterns in them, and why are they more memorable (this last question is not studied in this work). This part has more to do with cognitive science and (relatively) less directly applicable to the standard computer vision field. Nevertheless, I do not think we get any solid understanding about what
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection
MethodsDiffusion
