SpotEM: Efficient Video Search for Episodic Memory

Santhosh Kumar Ramakrishnan; Ziad Al-Halah; Kristen Grauman

arXiv:2306.15850·cs.CV·June 29, 2023·2 cites

SpotEM: Efficient Video Search for Episodic Memory

Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

PDF

Open Access 1 Video

TL;DR

SpotEM introduces an efficient video search method for episodic memory that intelligently selects promising video regions and uses semantic indexing to reduce computational costs while maintaining high accuracy.

Contribution

It presents a novel clip selector, semantic indexing features, and distillation losses to improve efficiency in long video search for episodic memory.

Findings

01

Reduces clip feature computation to 10-25% of original.

02

Maintains 84-97% of the original accuracy.

03

Effective across multiple EM models and long videos.

Abstract

The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e.g., "where did I leave my purse?"). Existing EM methods exhaustively extract expensive fixed-length clip features to look everywhere in the video for the answer, which is infeasible for long wearable-camera videos that span hours or even days. We propose SpotEM, an approach to achieve efficiency for a given EM method while maintaining good accuracy. SpotEM consists of three key ideas: 1) a novel clip selector that learns to identify promising video regions to search conditioned on the language query; 2) a set of low-cost semantic indexing features that capture the context of rooms, objects, and interactions that suggest where to look; and 3) distillation losses that address the optimization issues arising from end-to-end joint training of the clip selector and EM model. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SpotEM: Efficient Video Search for Episodic Memory· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsContrastive Language-Image Pre-training