Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries
Sree Bhattacharyya, Yaman Kumar Singla, Sudhir Yarram, Somesh Kumar Singh, Harini S I, James Z. Wang

TL;DR
This paper introduces a large-scale unsupervised dataset with over 82,000 videos and recall data, enabling improved modeling of visual memorability and retrieval tasks without relying on expensive human annotations.
Contribution
It presents the first large-scale unsupervised dataset for visual memorability modeling, along with models that outperform existing methods in recall generation and tip-of-the-tongue retrieval.
Findings
Unsupervised dataset effectively models memorability signals.
Fine-tuned vision-language models outperform GPT-4o in description generation.
Contrastive training enables multimodal tip-of-the-tongue retrieval.
Abstract
Visual content memorability has intrigued the scientific community for decades, with applications ranging widely, from understanding nuanced aspects of human memory to enhancing content design. A significant challenge in progressing the field lies in the expensive process of collecting memorability annotations from humans. This limits the diversity and scalability of datasets for modeling visual content memorability. Most existing datasets are limited to collecting aggregate memorability scores for visual content, not capturing the nuanced memorability signals present in natural, open-ended recall descriptions. In this work, we introduce the first large-scale unsupervised dataset designed explicitly for modeling visual memorability signals, containing over 82,000 videos, accompanied by descriptive recall data. We leverage tip-of-the-tongue (ToT) retrieval queries from online platforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
