Image Captioners Sometimes Tell More Than Images They See
Honori Udo, Takafumi Koshinaka

TL;DR
This paper investigates how well image captioners preserve original image information by comparing text-based and image-based classifiers, finding that descriptive text can sometimes outperform images in classification accuracy.
Contribution
The study evaluates image captioning models on a disaster classification task, revealing that text-based classifiers can outperform image classifiers and that combining both improves accuracy.
Findings
Text classifiers sometimes outperform image classifiers in disaster image classification.
Fusing image and text classifiers enhances overall accuracy.
Descriptive text retains significant image information for classification.
Abstract
Image captioning, a.k.a. "image-to-text," which generates descriptive text from given images, has been rapidly developing throughout the era of deep learning. To what extent is the information in the original image preserved in the descriptive text generated by an image captioner? To answer that question, we have performed experiments involving the classification of images from descriptive text alone, without referring to the images at all, and compared results with those from standard image-based classifiers. We have evaluate several image captioning models with respect to a disaster image classification task, CrisisNLP, and show that descriptive text classifiers can sometimes achieve higher accuracy than standard image-based classifiers. Further, we show that fusing an image-based classifier with a descriptive text classifier can provide improvement in accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
