Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

Zipeng Xu; Fangxiang Feng; Xiaojie Wang; Yushu Yang; Huixing Jiang,; Zhongyuan Wang

arXiv:2010.00361·cs.CV·March 25, 2022

Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

Zipeng Xu, Fangxiang Feng, Xiaojie Wang, Yushu Yang, Huixing Jiang,, Zhongyuan Wang

PDF

1 Repo

TL;DR

This paper introduces ADVSE, a novel approach that enhances goal-oriented visual dialogue by effectively incorporating answer effects into visual state estimation, leading to improved question generation and guessing accuracy.

Contribution

The paper proposes the Answer-Driven Visual State Estimator (ADVSE), which uses answer-driven attention and conditional information fusion to better utilize answers in visual dialogue systems.

Findings

01

Achieves state-of-the-art results on GuessWhat?! dataset.

02

Improves question efficiency and visual attention reliability.

03

Enhances visual dialogue agent performance.

Abstract

A goal-oriented visual dialogue involves multi-turn interactions between two agents, Questioner and Oracle. During which, the answer given by Oracle is of great significance, as it provides golden response to what Questioner concerns. Based on the answer, Questioner updates its belief on target visual content and further raises another question. Notably, different answers drive into different visual beliefs and future questions. However, existing methods always indiscriminately encode answers after much longer questions, resulting in a weak utilization of answers. In this paper, we propose an Answer-Driven Visual State Estimator (ADVSE) to impose the effects of different answers on visual states. First, we propose an Answer-Driven Focusing Attention (ADFA) to capture the answer-driven effect on visual attention by sharpening question-related attention and adjusting it by answer-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zipengxuc/ADVSE-GuessWhat
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.