Episodic Memory Question Answering
Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul, Khanna, Dhruv Batra, Devi Parikh

TL;DR
This paper introduces a new task called Episodic Memory Question Answering for egocentric AI assistants, along with a dataset and a model that encodes scene memories to localize answers within a tour video.
Contribution
It proposes the novel EMQA task, creates a grounded question dataset, and develops a semantic map-based model that outperforms baselines in spatio-temporal scene understanding.
Findings
Semantic scene memory improves answer localization accuracy.
Model is robust to noise in depth, pose, and camera jitter.
Outperforms naive solutions and competitive baselines.
Abstract
Egocentric augmented reality devices such as wearable glasses passively capture visual data as a human wearer tours a home environment. We envision a scenario wherein the human communicates with an AI agent powering such a device by asking questions (e.g., where did you last see my keys?). In order to succeed at this task, the egocentric AI assistant must (1) construct semantically rich and efficient scene memories that encode spatio-temporal information about objects seen during the tour and (2) possess the ability to understand the question and ground its answer into the semantic memory representation. Towards that end, we introduce (1) a new task - Episodic Memory Question Answering (EMQA) wherein an egocentric AI assistant is provided with a video sequence (the tour) and a question as an input and is asked to localize its answer to the question within the tour, (2) a dataset of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Augmented Reality Applications
