How to Take a Memorable Picture? Empowering Users with Actionable Feedback
Francesco Laiti, Davide Talon, Jacopo Staiano, Elisa Ricci

TL;DR
This paper introduces MemFeed and MemCoach, a novel framework that provides actionable, natural language feedback to help users improve photo memorability at capture time, supported by a new benchmark and experimental validation.
Contribution
It presents the first approach for human-interpretable memorability feedback using multimodal large language models, enabling actionable guidance for enhancing image memorability.
Findings
MemCoach improves memorability scores over zero-shot models.
The approach is training-free and employs a teacher-student strategy.
MemBench provides a systematic evaluation platform for memorability feedback.
Abstract
Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet, none of these paradigms supports users at capture time, when the crucial question is how to improve a photo memorability. We introduce the task of Memorability Feedback (MemFeed), where an automated model should provide actionable, human-interpretable guidance to users with the goal to enhance an image future recall. We also present MemCoach, the first approach designed to provide concrete suggestions in natural language for memorability improvement (e.g., "emphasize facial expression," "bring the subject forward"). Our method, based on Multimodal Large Language Models (MLLMs), is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Gaze Tracking and Assistive Technology
