A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni, Subrahmanyam Konakanchi

TL;DR
This paper evaluates the performance of various image captioning models on a new real-world dataset with diverse scene classes, highlighting challenges in real-world conditions often overlooked in controlled studies.
Contribution
It introduces a new dataset of over 800 images with detailed captions, and assesses multiple models' effectiveness in complex, real-world environments.
Findings
Models struggle with poor-quality images in real-world settings
The IC3 captioning approach provides more descriptive summaries
Performance varies significantly across different scene classes
Abstract
Image captioning is a computer vision task that involves generating natural language descriptions for images. This method has numerous applications in various domains, including image retrieval systems, medicine, and various industries. However, while there has been significant research in image captioning, most studies have focused on high quality images or controlled environments, without exploring the challenges of real-world image captioning. Real-world image captioning involves complex and dynamic environments with numerous points of attention, with images which are often very poor in quality, making it a challenging task, even for humans. This paper evaluates the performance of various models that are built on top of different encoding mechanisms, language decoders and training procedures using a newly created real-world dataset that consists of over 800+ images of over 65…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
