Concadia: Towards Image-Based Text Generation with a Purpose

Elisa Kreiss; Fei Fang; Noah D. Goodman; Christopher Potts

arXiv:2104.08376·cs.CL·October 31, 2022

Concadia: Towards Image-Based Text Generation with a Purpose

Elisa Kreiss, Fei Fang, Noah D. Goodman, Christopher Potts

PDF

Open Access 1 Repo

TL;DR

This paper introduces Concadia, a new dataset distinguishing image descriptions from captions, and demonstrates that incorporating textual context improves image-to-text model performance for practical applications.

Contribution

It provides a novel dataset and analysis to differentiate descriptions from captions, and shows that context-aware models enhance image-to-text generation.

Findings

01

Context augmentation improves model accuracy

02

Descriptions and captions serve different communicative roles

03

The dataset enables better practical image-to-text applications

Abstract

Current deep learning models often achieve excellent results on benchmark image-to-text datasets but fail to generate texts that are useful in practice. We argue that to close this gap, it is vital to distinguish descriptions from captions based on their distinct communicative roles. Descriptions focus on visual features and are meant to replace an image (often to increase accessibility), whereas captions appear alongside an image to supply additional information. To motivate this distinction and help people put it into practice, we introduce the publicly available Wikipedia-based dataset Concadia consisting of 96,918 images with corresponding English-language descriptions, captions, and surrounding context. Using insights from Concadia, models trained on it, and a preregistered human-subjects experiment with human- and model-generated texts, we characterize the commonalities and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elisakreiss/concadia
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling