Generating Diverse and Meaningful Captions

Annika Lindh; Robert J. Ross; Abhijit Mahalunkar; Giancarlo Salton,; John D. Kelleher

arXiv:1812.08126·cs.CV·December 20, 2018

Generating Diverse and Meaningful Captions

Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton,, John D. Kelleher

PDF

1 Repo

TL;DR

This paper introduces an unsupervised method for image captioning that enhances diversity and specificity of generated captions by leveraging an image retrieval model, surpassing previous models in diversity and novelty.

Contribution

The authors propose a novel unsupervised training approach that improves caption diversity and specificity by integrating signals from an image retrieval model.

Findings

01

Achieved state-of-the-art results in caption diversity and novelty.

02

Generated captions are more specific and varied compared to previous models.

03

Source code is publicly available for reproducibility.

Abstract

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AnnikaLindh/Diverse_and_Specific_Image_Captioning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.