Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh, Aarti Sharma, Millie Pant

TL;DR
This paper reviews the evolution of image captioning techniques, from basic models to advanced solutions, emphasizing their applications, evaluation metrics, and significance in fields like medicine.
Contribution
It provides a comprehensive overview of image captioning methods, including architectures, evaluation metrics, and real-world applications, especially in the medical domain.
Findings
Dissects various image captioning architectures and their mechanisms.
Highlights the importance of evaluation metrics for system performance.
Explores applications of image captioning in medical scenarios.
Abstract
In the era of evolving artificial intelligence, machines are increasingly emulating human-like capabilities, including visual perception and linguistic expression. Image captioning stands at the intersection of these domains, enabling machines to interpret visual content and generate descriptive text. This paper provides a thorough review of image captioning techniques, catering to individuals entering the field of machine learning who seek a comprehensive understanding of available options, from foundational methods to state-of-the-art approaches. Beginning with an exploration of primitive architectures, the review traces the evolution of image captioning models to the latest cutting-edge solutions. By dissecting the components of these architectures, readers gain insights into the underlying mechanisms and can select suitable approaches tailored to specific problem requirements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Video Analysis and Summarization
