Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis
Bowen Zhang, Ying Chen, Long Bai, Yan Zhao, Yuxiang Sun, Yixuan Yuan,, Jianhua Zhang, Hongliang Ren

TL;DR
This paper presents a low-rank adaptation method to fine-tune foundation models like DINOv2 for capsule endoscopy diagnosis, achieving high accuracy on public datasets with minimal training adjustments.
Contribution
Introduces a LoRA-based adaptation approach for foundation models, enabling effective and efficient customization for medical imaging tasks like capsule endoscopy diagnosis.
Findings
Achieved 97.75% accuracy on Kvasir-Capsule dataset.
Achieved 98.81% accuracy on Kvasirv2 dataset.
Demonstrated the effectiveness of LoRA for domain-specific model adaptation.
Abstract
Foundation models have become prominent in computer vision, achieving notable success in various tasks. However, their effectiveness largely depends on pre-training with extensive datasets. Applying foundation models directly to small datasets of capsule endoscopy images from scratch is challenging. Pre-training on broad, general vision datasets is crucial for successfully fine-tuning our model for specific tasks. In this work, we introduce a simplified approach called Adapt foundation models with a low-rank adaptation (LoRA) technique for easier customization. Our method, inspired by the DINOv2 foundation model, applies low-rank adaptation learning to tailor foundation models for capsule endoscopy diagnosis effectively. Unlike traditional fine-tuning methods, our strategy includes LoRA layers designed to absorb specific surgical domain knowledge. During the training process, we keep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGastrointestinal Bleeding Diagnosis and Treatment
MethodsFocus
