Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue   Representations Incrementally Encode Shared Knowledge

Brielen Madureira; David Schlangen

arXiv:2204.06970·cs.CL·February 26, 2025

Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge

Brielen Madureira, David Schlangen

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether visual dialogue models can incrementally encode shared knowledge like a mental scoreboard, revealing moderate but inconsistent scorekeeping abilities potentially due to task limitations.

Contribution

It introduces a theory-based evaluation method to assess incremental shared knowledge encoding in pretrained visual dialogue models.

Findings

01

Models show moderate ability to distinguish shared from private statements.

02

Scorekeeping ability is not always incrementally consistent.

03

Limited grounding interactions in the original task may hinder scorekeeping development.

Abstract

Cognitively plausible visual dialogue models should keep a mental scoreboard of shared established facts in the dialogue context. We propose a theory-based evaluation method for investigating to what degree models pretrained on the VisDial dataset incrementally build representations that appropriately do scorekeeping. Our conclusion is that the ability to make the distinction between shared and privately known statements along the dialogue is moderately present in the analysed models, but not always incrementally consistent, which may partially be due to the limited need for grounding interactions in the original task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

briemadu/scorekeeping
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Speech and dialogue systems