Image Captioners Sometimes Tell More Than Images They See

Honori Udo; Takafumi Koshinaka

arXiv:2305.02932·cs.CV·May 12, 2023·1 cites

Image Captioners Sometimes Tell More Than Images They See

Honori Udo, Takafumi Koshinaka

PDF

Open Access

TL;DR

This paper investigates how well image captioners preserve original image information by comparing text-based and image-based classifiers, finding that descriptive text can sometimes outperform images in classification accuracy.

Contribution

The study evaluates image captioning models on a disaster classification task, revealing that text-based classifiers can outperform image classifiers and that combining both improves accuracy.

Findings

01

Text classifiers sometimes outperform image classifiers in disaster image classification.

02

Fusing image and text classifiers enhances overall accuracy.

03

Descriptive text retains significant image information for classification.

Abstract

Image captioning, a.k.a. "image-to-text," which generates descriptive text from given images, has been rapidly developing throughout the era of deep learning. To what extent is the information in the original image preserved in the descriptive text generated by an image captioner? To answer that question, we have performed experiments involving the classification of images from descriptive text alone, without referring to the images at all, and compared results with those from standard image-based classifiers. We have evaluate several image captioning models with respect to a disaster image classification task, CrisisNLP, and show that descriptive text classifiers can sometimes achieve higher accuracy than standard image-based classifiers. Further, we show that fusing an image-based classifier with a descriptive text classifier can provide improvement in accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning