Context informs pragmatic interpretation in vision-language models

Alvin Wei Ming Tan; Ben Prystawski; Veronica Boyce; Michael C. Frank

arXiv:2511.03908·cs.CL·November 7, 2025

Context informs pragmatic interpretation in vision-language models

Alvin Wei Ming Tan, Ben Prystawski, Veronica Boyce, Michael C. Frank

PDF

Open Access

TL;DR

This paper investigates how context influences pragmatic reasoning in vision-language models through iterated reference games, revealing that models improve significantly with relevant context but still lag behind humans, especially with abstract referents.

Contribution

The study demonstrates the importance of context in enhancing vision-language models' pragmatic reasoning in multi-turn reference tasks, highlighting current limitations and potential improvements.

Findings

01

Models outperform chance with relevant context

02

Performance improves over trials with context

03

Abstract referents remain challenging for models

Abstract

Iterated reference games - in which players repeatedly pick out novel referents using language - present a test case for agents' ability to perform context-sensitive pragmatic reasoning in multi-turn linguistic environments. We tested humans and vision-language models on trials from iterated reference games, varying the given context in terms of amount, order, and relevance. Without relevant context, models were above chance but substantially worse than humans. However, with relevant context, model performance increased dramatically over trials. Few-shot reference games with abstract referents remain a difficult task for machine learning models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Speech and dialogue systems