Pragmatically Informative Image Captioning with Character-Level   Inference

Reuben Cohn-Gordon; Noah Goodman; Christopher Potts

arXiv:1804.05417·cs.CL·May 11, 2018

Pragmatically Informative Image Captioning with Character-Level Inference

Reuben Cohn-Gordon, Noah Goodman, Christopher Potts

PDF

TL;DR

This paper introduces a character-level RSA-based image captioning system that produces more informative captions by distinguishing images, achieving efficiency and improved performance over previous methods.

Contribution

It presents a novel character-level RSA inference method for image captioning, enabling pragmatic and efficient caption generation without extensive sampling.

Findings

01

Outperforms non-pragmatic baseline

02

Outperforms word-level RSA captioner

03

Efficient character-level inference

Abstract

We combine a neural image captioner with a Rational Speech Acts (RSA) model to make a system that is pragmatically informative: its objective is to produce captions that are not merely true but also distinguish their inputs from similar images. Previous attempts to combine RSA with neural image captioning require an inference which normalizes over the entire set of possible utterances. This poses a serious problem of efficiency, previously solved by sampling a small subset of possible utterances. We instead solve this problem by implementing a version of RSA which operates at the level of characters ("a","b","c"...) during the unrolling of the caption. We find that the utterance-level effect of referential captions can be obtained with only character-level decisions. Finally, we introduce an automatic method for testing the performance of pragmatic speaker models, and show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.