Contextual Media Retrieval Using Natural Language Queries
Sreyasi Nag Chowdhury, Mateusz Malinowski, Andreas Bulling, Mario, Fritz

TL;DR
This paper introduces Xplore-M-Ego, a media retrieval system that uses natural language queries to search a dynamic, context-aware database of images and videos, addressing variability in user queries.
Contribution
The work presents a novel spatio-temporal media retrieval system that incorporates personalization and online learning to handle diverse natural language queries.
Findings
System effectively handles inter-user variability.
Personalization improves retrieval accuracy.
System performs well on real user queries.
Abstract
The widespread integration of cameras in hand-held and head-worn devices as well as the ability to share content online enables a large and diverse visual capture of the world that millions of users build up collectively every day. We envision these images as well as associated meta information, such as GPS coordinates and timestamps, to form a collective visual memory that can be queried while automatically taking the ever-changing context of mobile users into account. As a first step towards this vision, in this work we present Xplore-M-Ego: a novel media retrieval system that allows users to query a dynamic database of images and videos using spatio-temporal natural language queries. We evaluate our system using a new dataset of real user queries as well as through a usability study. One key finding is that there is a considerable amount of inter-user variability, for example in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
