Multi-Modal Dialogue State Tracking for Playing GuessWhich Game

Wei Pang; Ruixue Duan; Jinfu Yang; Ning Li

arXiv:2408.08431·cs.AI·August 19, 2024

Multi-Modal Dialogue State Tracking for Playing GuessWhich Game

Wei Pang, Ruixue Duan, Jinfu Yang, Ning Li

PDF

1 Repo

TL;DR

This paper introduces a multi-modal dialogue state tracking model for the GuessWhich game, enabling a Questioner Bot to perform visual reasoning through mental imagery, leading to state-of-the-art results on VisDial datasets.

Contribution

It proposes a novel mental imagery-based dialogue state tracking approach for visual reasoning in GuessWhich, improving over existing methods that lack visual context.

Findings

01

Achieves new state-of-the-art performance on VisDial datasets.

02

Effectively models visually related reasoning through mental imagery.

03

Demonstrates robustness across multiple dataset versions.

Abstract

GuessWhich is an engaging visual dialogue game that involves interaction between a Questioner Bot (QBot) and an Answer Bot (ABot) in the context of image-guessing. In this game, QBot's objective is to locate a concealed image solely through a series of visually related questions posed to ABot. However, effectively modeling visually related reasoning in QBot's decision-making process poses a significant challenge. Current approaches either lack visual information or rely on a single real image sampled at each round as decoding context, both of which are inadequate for visual reasoning. To address this limitation, we propose a novel approach that focuses on visually related reasoning through the use of a mental model of the undisclosed image. Within this framework, QBot learns to represent mental imagery, enabling robust visual reasoning by tracking the dialogue state. The dialogue state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xubuvd/guesswhich
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.