LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation
Tianyu Zhou, Junyi Tang, Zehui Li, Dahong Qian, Suncheng Xiang

TL;DR
This paper introduces LDP, a parameter-efficient multimodal LLM framework for medical report generation that improves accuracy and reduces training costs using a curated dataset and specialized fine-tuning techniques.
Contribution
The paper presents a novel multimodal LLM fine-tuning approach with a new dataset, achieving high-quality medical reports with significantly reduced computational costs.
Findings
Outperforms existing baselines on automated metrics and clinical evaluations
Achieves a Physician Score of 7.2/10 in expert assessments
Reduces training costs by 833 times compared to full fine-tuning
Abstract
Colonoscopic polyp diagnosis is pivotal for early colorectal cancer detection, yet traditional automated reporting suffers from inconsistencies and hallucinations due to the scarcity of high-quality multimodal medical data. To bridge this gap, we propose LDP, a novel framework leveraging multimodal large language models (MLLMs) for professional polyp diagnosis report generation. Specifically, we curate MMEndo, a multimodal endoscopic dataset comprising expert-annotated colonoscopy image-text pairs. We fine-tune the Qwen2-VL-7B backbone using Parameter-Efficient Fine-Tuning (LoRA) and align it with clinical standards via Direct Preference Optimization (DPO). Extensive experiments show that our LDP outperforms existing baselines on both automated metrics and rigorous clinical expert evaluations (achieving a Physician Score of 7.2/10), significantly reducing training computational costs by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications
