Interactive Episodic Memory with User Feedback
Nikesh Subedi, Loris Bazzani, Ziad Al-Halah

TL;DR
This paper introduces an interactive episodic memory system that incorporates user feedback to refine search results in long egocentric videos, improving accuracy and robustness in real-world scenarios.
Contribution
It proposes a new feedback-based interaction framework, datasets, and a lightweight plug-and-play module to enhance episodic memory models with user input.
Findings
Significant improvement over state-of-the-art on three benchmarks.
Effective incorporation of user feedback enhances model accuracy.
Competitive with commercial vision-language models while maintaining efficiency.
Abstract
In episodic memory with natural language queries (EM-NLQ), a user may ask a question (e.g., "Where did I place the mug?") that requires searching a long egocentric video, captured from the user's perspective, to find the moment that answers it. However, queries can be ambiguous or incomplete, leading to incorrect responses. Current methods ignore this key aspect and address EM-NLQ in a one-shot setup, limiting their applicability in real-world scenarios. In this work, we address this gap and introduce the Episodic Memory with Questions and Feedback task (EM-QnF). Here, the user can provide feedback on the model's initial prediction or add more information (e.g., "Before this. I'm looking for the big blue mug not the white one"), helping the model refine its predictions interactively. To this end, we collect datasets for feedback-based interaction and propose a lightweight training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
