Divide-and-Conquer: Tree-structured Strategy with Answer Distribution Estimator for Goal-Oriented Visual Dialogue
Shuo Cai, Xinzhe Han, Shuhui Wang

TL;DR
This paper introduces TSADE, a tree-structured strategy for goal-oriented visual dialogue that guides question generation by systematically narrowing down candidate objects, improving accuracy and efficiency.
Contribution
The paper proposes a novel divide-and-conquer approach with an answer distribution estimator to enhance question generation in visual dialogue tasks.
Findings
Achieves higher task accuracy with fewer questions and rounds.
Reduces randomness in question generation compared to traditional methods.
Facilitates higher-quality question generation.
Abstract
Goal-oriented visual dialogue involves multi-round interaction between artificial agents, which has been of remarkable attention due to its wide applications. Given a visual scene, this task occurs when a Questioner asks an action-oriented question and an Answerer responds with the intent of letting the Questioner know the correct action to take. The quality of questions affects the accuracy and efficiency of the target search progress. However, existing methods lack a clear strategy to guide the generation of questions, resulting in the randomness in the search process and inconvergent results. We propose a Tree-Structured Strategy with Answer Distribution Estimator (TSADE) which guides the question generation by excluding half of the current candidate objects in each round. The above process is implemented by maximizing a binary reward inspired by the ``divide-and-conquer'' paradigm.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Geographic Information Systems Studies
