Divide-and-Conquer: Tree-structured Strategy with Answer Distribution   Estimator for Goal-Oriented Visual Dialogue

Shuo Cai; Xinzhe Han; Shuhui Wang

arXiv:2502.05806·cs.CV·February 11, 2025

Divide-and-Conquer: Tree-structured Strategy with Answer Distribution Estimator for Goal-Oriented Visual Dialogue

Shuo Cai, Xinzhe Han, Shuhui Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces TSADE, a tree-structured strategy for goal-oriented visual dialogue that guides question generation by systematically narrowing down candidate objects, improving accuracy and efficiency.

Contribution

The paper proposes a novel divide-and-conquer approach with an answer distribution estimator to enhance question generation in visual dialogue tasks.

Findings

01

Achieves higher task accuracy with fewer questions and rounds.

02

Reduces randomness in question generation compared to traditional methods.

03

Facilitates higher-quality question generation.

Abstract

Goal-oriented visual dialogue involves multi-round interaction between artificial agents, which has been of remarkable attention due to its wide applications. Given a visual scene, this task occurs when a Questioner asks an action-oriented question and an Answerer responds with the intent of letting the Questioner know the correct action to take. The quality of questions affects the accuracy and efficiency of the target search progress. However, existing methods lack a clear strategy to guide the generation of questions, resulting in the randomness in the search process and inconvergent results. We propose a Tree-Structured Strategy with Answer Distribution Estimator (TSADE) which guides the question generation by excluding half of the current candidate objects in each round. The above process is implemented by maximizing a binary reward inspired by the ``divide-and-conquer'' paradigm.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Divide-and-Conquer: Tree-structured Strategy with Answer Distribution Estimator for Goal-Oriented Visual Dialogue· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Geographic Information Systems Studies