UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
Quan Van Nguyen, Huy Quang Pham, Dan Quang Tran, Thang Kien-Bao, Nguyen, Nhat-Hao Nguyen-Dang, Bao-Thien Nguyen-Tat

TL;DR
This paper presents Transformer-based models for automated diagnostic captioning of radiology images, demonstrating high-quality report generation that can improve clinical efficiency and accuracy.
Contribution
Introduction of Transformer encoder-decoder and Query Transformer architectures for radiology captioning, achieving top performance in the ImageCLEFmedical2024 challenge.
Findings
VisionDiagnostor-BioBART achieved highest BERTScore of 0.6267
Team DarkCow ranked third on the leaderboard
Models effectively generate high-quality diagnostic captions
Abstract
Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLung Cancer Diagnosis and Treatment · Multimodal Machine Learning Applications · Radiomics and Machine Learning in Medical Imaging
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections
