DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration
Nhi Ngoc-Yen Nguyen, Le-Huy Tu, Dieu-Phuong Nguyen, Nhat-Tan Do, Minh Triet Thai, Bao-Thien Nguyen-Tat

TL;DR
This paper introduces an advanced medical image captioning method that integrates concept detection into attention mechanisms, significantly improving caption quality and medical image interpretation accuracy.
Contribution
It presents a novel approach combining concept detection with attention mechanisms in medical caption generation, achieving competitive results in the ImageCLEFmedical Caption 2024 challenge.
Findings
Concept detection with Swin-V2 achieved F1 scores of 0.58944 and 0.61998.
Enhanced captioning model attained BERTScores of 0.60589 and 0.5794.
Method demonstrated improved medical caption quality and potential healthcare impact.
Abstract
Purpose: Our study presents an enhanced approach to medical image caption generation by integrating concept detection into attention mechanisms. Method: This method utilizes sophisticated models to identify critical concepts within medical images, which are then refined and incorporated into the caption generation process. Results: Our concept detection task, which employed the Swin-V2 model, achieved an F1 score of 0.58944 on the validation set and 0.61998 on the private test set, securing the third position. For the caption prediction task, our BEiT+BioBart model, enhanced with concept integration and post-processing techniques, attained a BERTScore of 0.60589 on the validation set and 0.5794 on the private test set, placing ninth. Conclusion: These results underscore the efficacy of concept-aware algorithms in generating precise and contextually appropriate medical descriptions. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
