Spot the Difference: A Cooperative Object-Referring Game in   Non-Perfectly Co-Observable Scene

Duo Zheng; Fandong Meng; Qingyi Si; Hairun Fan; Zipeng Xu; Jie Zhou,; Fangxiang Feng; Xiaojie Wang

arXiv:2203.08362·cs.CV·March 17, 2022·1 cites

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

Duo Zheng, Fandong Meng, Qingyi Si, Hairun Fan, Zipeng Xu, Jie Zhou,, Fangxiang Feng, Xiaojie Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new dialog task in non-perfectly co-observable visual scenes, along with a large-scale dataset and benchmark models, to advance research in realistic visual dialog scenarios involving scene differences.

Contribution

It proposes a novel object-referring game in non-co-observable scenes, creates the SpotDiff dataset with VR images and dialogs, and provides benchmark models and analysis.

Findings

01

Benchmark models achieve baseline performance.

02

Identified key challenges in dialog strategy and object categorization.

03

Dataset enables future research in realistic visual dialog scenarios.

Abstract

Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation, especially such as GuessWhich and GuessWhat, where the only image is visible by either and both of the questioner and the answerer, respectively. Researchers explore more on visual dialog tasks in such kind of single- or perfectly co-observable visual scene, while somewhat neglect the exploration on tasks of non perfectly co-observable visual scene, where the images accessed by two agents may not be exactly the same, often occurred in practice. Although building common ground in non-perfectly co-observable visual scene through conversation is significant for advanced dialog agents, the lack of such dialog task and corresponding large-scale dataset makes it impossible to carry out in-depth research. To break this limitation, we propose an object-referring game in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zd11024/spot_difference
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Social Robot Interaction and HRI

MethodsNetwork On Network