# Generating Diverse and Meaningful Captions

**Authors:** Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton,, John D. Kelleher

arXiv: 1812.08126 · 2018-12-20

## TL;DR

This paper introduces an unsupervised method for image captioning that enhances diversity and specificity of generated captions by leveraging an image retrieval model, surpassing previous models in diversity and novelty.

## Contribution

The authors propose a novel unsupervised training approach that improves caption diversity and specificity by integrating signals from an image retrieval model.

## Key findings

- Achieved state-of-the-art results in caption diversity and novelty.
- Generated captions are more specific and varied compared to previous models.
- Source code is publicly available for reproducibility.

## Abstract

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.

---
Source: https://tomesphere.com/paper/1812.08126