TL;DR
This paper introduces a progressive, two-step transformer-based framework for radiology report generation, improving accuracy by first extracting global concepts from images and then refining them into detailed reports, inspired by Curriculum Learning.
Contribution
It presents a novel image-to-text-to-text generation approach that divides report creation into global concept extraction and detailed report refinement, outperforming previous methods.
Findings
Achieved state-of-the-art results on two benchmark datasets.
Demonstrated the effectiveness of a two-step generation process.
Improved coherence and accuracy in radiology report generation.
Abstract
Inspired by Curriculum Learning, we propose a consecutive (i.e., image-to-text-to-text) generation framework where we divide the problem of radiology report generation into two steps. Contrary to generating the full radiology report from the image at once, the model generates global concepts from the image in the first step and then reforms them into finer and coherent texts using a transformer architecture. We follow the transformer-based sequence-to-sequence paradigm at each step. We improve upon the state-of-the-art on two benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
