Where were my keys? -- Aggregating Spatial-Temporal Instances of Objects for Efficient Retrieval over Long Periods of Time
Ifrah Idrees, Zahid Hasan, Steven P. Reiss, and Stefanie Tellex

TL;DR
This paper introduces D3A, a hierarchical association method that efficiently creates and queries spatial-temporal representations of objects in environments with moving cameras, significantly improving retrieval speed and accuracy.
Contribution
D3A is a novel online hierarchical learning approach that organizes object instances into a spatial-temporal database for efficient retrieval in dynamic environments.
Findings
D3A achieves 81.98% accuracy in 11.7 ms for 150 queries.
D3A is 47 times faster than naive methods.
D3A improves accuracy by 33% over baseline.
Abstract
Robots equipped with situational awareness can help humans efficiently find their lost objects by leveraging spatial and temporal structure. Existing approaches to video and image retrieval do not take into account the unique constraints imposed by a moving camera with a partial view of the environment. We present a Detection-based 3-level hierarchical Association approach, D3A, to create an efficient query-able spatial-temporal representation of unique object instances in an environment. D3A performs online incremental and hierarchical learning to identify keyframes that best represent the unique objects in the environment. These keyframes are learned based on both spatial and temporal features and once identified their corresponding spatial-temporal information is organized in a key-value database. D3A allows for a variety of query patterns such as querying for objects with/without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
