Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge
Brielen Madureira, David Schlangen

TL;DR
This paper investigates whether visual dialogue models can incrementally encode shared knowledge like a mental scoreboard, revealing moderate but inconsistent scorekeeping abilities potentially due to task limitations.
Contribution
It introduces a theory-based evaluation method to assess incremental shared knowledge encoding in pretrained visual dialogue models.
Findings
Models show moderate ability to distinguish shared from private statements.
Scorekeeping ability is not always incrementally consistent.
Limited grounding interactions in the original task may hinder scorekeeping development.
Abstract
Cognitively plausible visual dialogue models should keep a mental scoreboard of shared established facts in the dialogue context. We propose a theory-based evaluation method for investigating to what degree models pretrained on the VisDial dataset incrementally build representations that appropriately do scorekeeping. Our conclusion is that the ability to make the distinction between shared and privately known statements along the dialogue is moderately present in the analysed models, but not always incrementally consistent, which may partially be due to the limited need for grounding interactions in the original task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Speech and dialogue systems
