Know What You Don't Know: Modeling a Pragmatic Speaker that Refers to Objects of Unknown Categories
Sina Zarrie{\ss}, David Schlangen

TL;DR
This paper introduces a neural pragmatic speaker model for zero-shot reference games that effectively refers to objects of unknown categories, improving communication success by reasoning about uncertainty.
Contribution
It combines zero-shot learning with pragmatic language modeling, extending rational speech act models to handle unknown object categories in visual reference tasks.
Findings
Pragmatic reasoning reduces noun usage compared to literal models.
The model improves reference resolution accuracy.
Fewer distractor categories are named by the pragmatic speaker.
Abstract
Zero-shot learning in Language & Vision is the task of correctly labelling (or naming) objects of novel categories. Another strand of work in L&V aims at pragmatically informative rather than ``correct'' object descriptions, e.g. in reference games. We combine these lines of research and model zero-shot reference games, where a speaker needs to successfully refer to a novel object in an image. Inspired by models of "rational speech acts", we extend a neural generator to become a pragmatic speaker reasoning about uncertain object categories. As a result of this reasoning, the generator produces fewer nouns and names of distractor categories as compared to a literal speaker. We show that this conversational strategy for dealing with novel objects often improves communicative success, in terms of resolution accuracy of an automatic listener.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
