Diagnostic Captioning: A Survey
John Pavlopoulos, Vasiliki Kougia, Ion Androutsopoulos, Dimitris, Papamichail

TL;DR
Diagnostic Captioning automates the generation of diagnostic texts from medical images, aiding physicians and reducing errors, with recent advances driven by deep learning and image captioning techniques.
Contribution
This survey provides a comprehensive overview of Diagnostic Captioning, including datasets, evaluation metrics, recent systems, shortcomings, and future research directions.
Findings
Several datasets available for DC evaluation
Deep learning has significantly advanced DC systems
Identified key challenges and future research directions
Abstract
Diagnostic Captioning (DC) concerns the automatic generation of a diagnostic text from a set of medical images of a patient collected during an examination. DC can assist inexperienced physicians, reducing clinical errors. It can also help experienced physicians produce diagnostic reports faster. Following the advances of deep learning, especially in generic image captioning, DC has recently attracted more attention, leading to several systems and datasets. This article is an extensive overview of DC. It presents relevant datasets, evaluation measures, and up to date systems. It also highlights shortcomings that hinder DC's progress and proposes future directions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
