MedFILIP: Medical Fine-grained Language-Image Pre-training
Xinjie Liang, Xiangyu Li, Fanding Li, Jie Jiang, Qing Dong, Wei Wang,, Kuanquan Wang, Suyu Dong, Gongning Luo, Shuo Li

TL;DR
MedFILIP is a novel medical vision-language pretraining model that leverages fine-grained disease details, knowledge injection, and semantic similarity to improve diagnostic accuracy across multiple datasets.
Contribution
It introduces a new fine-grained VLP approach with disease-specific knowledge extraction, knowledge injection, and enhanced image-text alignment for medical imaging.
Findings
Achieves state-of-the-art performance on multiple datasets.
Improves classification accuracy by up to 6.69%.
Effectively models disease details and relationships in medical images.
Abstract
Medical vision-language pretraining (VLP) that leverages naturally-paired medical image-report data is crucial for medical image analysis. However, existing methods struggle to accurately characterize associations between images and diseases, leading to inaccurate or incomplete diagnostic results. In this work, we propose MedFILIP, a fine-grained VLP model, introduces medical image-specific knowledge through contrastive learning, specifically: 1) An information extractor based on a large language model is proposed to decouple comprehensive disease details from reports, which excels in extracting disease deals through flexible prompt engineering, thereby effectively reducing text complexity while retaining rich information at a tiny cost. 2) A knowledge injector is proposed to construct relationships between categories and visual attributes, which help the model to make judgments based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging
