Pixels to Prose: Understanding the art of Image Captioning

Hrishikesh Singh; Aarti Sharma; Millie Pant

arXiv:2408.15714·cs.CV·August 29, 2024

Pixels to Prose: Understanding the art of Image Captioning

Hrishikesh Singh, Aarti Sharma, Millie Pant

PDF

Open Access

TL;DR

This paper reviews the evolution of image captioning techniques, from basic models to advanced solutions, emphasizing their applications, evaluation metrics, and significance in fields like medicine.

Contribution

It provides a comprehensive overview of image captioning methods, including architectures, evaluation metrics, and real-world applications, especially in the medical domain.

Findings

01

Dissects various image captioning architectures and their mechanisms.

02

Highlights the importance of evaluation metrics for system performance.

03

Explores applications of image captioning in medical scenarios.

Abstract

In the era of evolving artificial intelligence, machines are increasingly emulating human-like capabilities, including visual perception and linguistic expression. Image captioning stands at the intersection of these domains, enabling machines to interpret visual content and generate descriptive text. This paper provides a thorough review of image captioning techniques, catering to individuals entering the field of machine learning who seek a comprehensive understanding of available options, from foundational methods to state-of-the-art approaches. Beginning with an exploration of primitive architectures, the review traces the evolution of image captioning models to the latest cutting-edge solutions. By dissecting the components of these architectures, readers gain insights into the underlying mechanisms and can select suitable approaches tailored to specific problem requirements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Video Analysis and Summarization