Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene
Duo Zheng, Fandong Meng, Qingyi Si, Hairun Fan, Zipeng Xu, Jie Zhou,, Fangxiang Feng, Xiaojie Wang

TL;DR
This paper introduces a new dialog task in non-perfectly co-observable visual scenes, along with a large-scale dataset and benchmark models, to advance research in realistic visual dialog scenarios involving scene differences.
Contribution
It proposes a novel object-referring game in non-co-observable scenes, creates the SpotDiff dataset with VR images and dialogs, and provides benchmark models and analysis.
Findings
Benchmark models achieve baseline performance.
Identified key challenges in dialog strategy and object categorization.
Dataset enables future research in realistic visual dialog scenarios.
Abstract
Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation, especially such as GuessWhich and GuessWhat, where the only image is visible by either and both of the questioner and the answerer, respectively. Researchers explore more on visual dialog tasks in such kind of single- or perfectly co-observable visual scene, while somewhat neglect the exploration on tasks of non perfectly co-observable visual scene, where the images accessed by two agents may not be exactly the same, often occurred in practice. Although building common ground in non-perfectly co-observable visual scene through conversation is significant for advanced dialog agents, the lack of such dialog task and corresponding large-scale dataset makes it impossible to carry out in-depth research. To break this limitation, we propose an object-referring game in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Social Robot Interaction and HRI
MethodsNetwork On Network
