Pragmatically Informative Image Captioning with Character-Level Inference
Reuben Cohn-Gordon, Noah Goodman, Christopher Potts

TL;DR
This paper introduces a character-level RSA-based image captioning system that produces more informative captions by distinguishing images, achieving efficiency and improved performance over previous methods.
Contribution
It presents a novel character-level RSA inference method for image captioning, enabling pragmatic and efficient caption generation without extensive sampling.
Findings
Outperforms non-pragmatic baseline
Outperforms word-level RSA captioner
Efficient character-level inference
Abstract
We combine a neural image captioner with a Rational Speech Acts (RSA) model to make a system that is pragmatically informative: its objective is to produce captions that are not merely true but also distinguish their inputs from similar images. Previous attempts to combine RSA with neural image captioning require an inference which normalizes over the entire set of possible utterances. This poses a serious problem of efficiency, previously solved by sampling a small subset of possible utterances. We instead solve this problem by implementing a version of RSA which operates at the level of characters ("a","b","c"...) during the unrolling of the caption. We find that the utterance-level effect of referential captions can be obtained with only character-level decisions. Finally, we introduce an automatic method for testing the performance of pragmatic speaker models, and show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
