Navigating Connected Memories with a Task-oriented Dialog System
Seungwhan Moon, Satwik Kottur, Alborz Geramifard, Babak Damavandi

TL;DR
This paper introduces a new dataset and task for multi-turn, interactive dialog systems to improve personal media retrieval, enabling users to have more natural and connected conversations about their memories.
Contribution
It presents COMET, a large, grounded dialog dataset for connected memories, and proposes benchmark tasks to advance multi-turn, multimodal dialog systems for personal media retrieval.
Findings
State-of-the-art models show challenges in multimodal understanding.
The dataset enables benchmarking of connected memory dialog tasks.
Analysis highlights key difficulties in multi-turn, multimodal conversations.
Abstract
Recent years have seen an increasing trend in the volume of personal media captured by users, thanks to the advent of smartphones and smart glasses, resulting in large media collections. Despite conversation being an intuitive human-computer interface, current efforts focus mostly on single-shot natural language based media retrieval to aid users query their media and re-live their memories. This severely limits the search functionality as users can neither ask follow-up queries nor obtain information without first formulating a single-turn query. In this work, we propose dialogs for connected memories as a powerful tool to empower users to search their media collection through a multi-turn, interactive conversation. Towards this, we collect a new task-oriented dialog dataset COMET, which contains user<->assistant dialogs (totaling utterances), grounded in simulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
