TL;DR
The paper introduces Telescopic Adapters, a parameter-efficient fine-tuning method with depth-aware scaling for vision-language models in medical imaging, achieving high performance with minimal trainable parameters.
Contribution
It proposes a novel PEFT framework that dynamically scales adapter capacity across transformer layers, improving efficiency and effectiveness in medical image segmentation tasks.
Findings
Achieves superior performance with only 613k trainable parameters.
Deeper layers require more adaptation capacity, validating the telescopic scaling approach.
Outperforms traditional fine-tuning across five diverse medical datasets.
Abstract
Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches. Existing Parameter-Efficient Fine-Tuning (PEFT) methods apply uniform adapter dimensions across all transformer layers, leading to suboptimal parameter allocation and reduced adaptation efficiency. We introduce Telescopic Adapters, a novel PEFT framework that employs depth-aware scaling to progressively increase adapter capacity from shallow to deep transformer layers. Our method integrates lightweight bottleneck modules within CLIPSeg's vision and text encoders, with adapter dimensions dynamically scaled based on layer depth and semantic relevance. Using only 613k trainable parameters--244x fewer than end-to-end fine-tuning, Telescopic Adapters achieve superior performance across five diverse medical datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
