Are Current Decoding Strategies Capable of Facing the Challenges of   Visual Dialogue?

Amit Kumar Chaudhary; Alex J. Lucassen; Ioanna Tsani; Alberto Testoni

arXiv:2210.12997·cs.CL·October 25, 2022

Are Current Decoding Strategies Capable of Facing the Challenges of Visual Dialogue?

Amit Kumar Chaudhary, Alex J. Lucassen, Ioanna Tsani, Alberto Testoni

PDF

Open Access

TL;DR

This paper evaluates various decoding strategies in visual dialogue systems, revealing their limitations in balancing lexical richness, task accuracy, and visual grounding, and offers insights for developing improved algorithms.

Contribution

It provides a comprehensive comparison of decoding strategies in visual dialogue, highlighting their strengths and weaknesses to guide future improvements.

Findings

01

None of the strategies balance all key aspects effectively

02

Decoding strategies vary significantly in handling visual grounding

03

Insights suggest directions for more effective decoding algorithms

Abstract

Decoding strategies play a crucial role in natural language generation systems. They are usually designed and evaluated in open-ended text-only tasks, and it is not clear how different strategies handle the numerous challenges that goal-oriented multimodal systems face (such as grounding and informativeness). To answer this question, we compare a wide variety of different decoding strategies and hyper-parameter configurations in a Visual Dialogue referential game. Although none of them successfully balance lexical richness, accuracy in the task, and visual grounding, our in-depth analysis allows us to highlight the strengths and weaknesses of each decoding strategy. We believe our findings and suggestions may serve as a starting point for designing more effective decoding algorithms that handle the challenges of Visual Dialogue tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Natural Language Processing Techniques