Interactive Episodic Memory with User Feedback

Nikesh Subedi; Loris Bazzani; Ziad Al-Halah

arXiv:2604.24893·cs.CV·April 29, 2026

Interactive Episodic Memory with User Feedback

Nikesh Subedi, Loris Bazzani, Ziad Al-Halah

PDF

TL;DR

This paper introduces an interactive episodic memory system that incorporates user feedback to refine search results in long egocentric videos, improving accuracy and robustness in real-world scenarios.

Contribution

It proposes a new feedback-based interaction framework, datasets, and a lightweight plug-and-play module to enhance episodic memory models with user input.

Findings

01

Significant improvement over state-of-the-art on three benchmarks.

02

Effective incorporation of user feedback enhances model accuracy.

03

Competitive with commercial vision-language models while maintaining efficiency.

Abstract

In episodic memory with natural language queries (EM-NLQ), a user may ask a question (e.g., "Where did I place the mug?") that requires searching a long egocentric video, captured from the user's perspective, to find the moment that answers it. However, queries can be ambiguous or incomplete, leading to incorrect responses. Current methods ignore this key aspect and address EM-NLQ in a one-shot setup, limiting their applicability in real-world scenarios. In this work, we address this gap and introduce the Episodic Memory with Questions and Feedback task (EM-QnF). Here, the user can provide feedback on the model's initial prediction or add more information (e.g., "Before this. I'm looking for the big blue mug not the white one"), helping the model refine its predictions interactively. To this end, we collect datasets for feedback-based interaction and propose a lightweight training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.