Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
Ziquan Zhu, Hanruo Zhu, Siyuan Lu, Xiang Li, Yanda Meng, Gaojie Jin, Lu Yin, Lijie Hu, Di Wang, Lu Liu, Tianjin Huang

TL;DR
This paper investigates the limitations of traditional adapters in low-data medical imaging and introduces the Dual-Kernel Adapter, which expands spatial context to improve performance in data-scarce scenarios.
Contribution
The paper presents the Dual-Kernel Adapter, a novel module that enhances adapter performance in low-data medical imaging by expanding spatial receptive fields.
Findings
Conventional adapters degrade under extreme data scarcity, worse than linear probing.
Dual-Kernel Adapter significantly outperforms existing adapters in low-data regimes.
Expanding spatial context improves model performance in medical image analysis.
Abstract
Adapters have become a widely adopted strategy for efficient fine-tuning of large pretrained models, particularly in resource-constrained settings. However, their performance under extreme data scarcity, common in medical imaging due to high annotation costs, privacy regulations, and fragmented datasets, remains underexplored. In this work, we present the first comprehensive study of adapter-based fine-tuning for large pretrained models in low-data medical imaging scenarios. We find that, contrary to their promise, conventional adapters can degrade performance under severe data constraints, performing even worse than simple linear probing when trained on less than 1% of the corresponding training data. Through systematic analysis, we identify a sharp reduction in Effective Receptive Field (ERF) as a key factor behind this degradation. Motivated by these findings, we propose the…
Peer Reviews
Decision·ICLR 2026 Poster
- Important Problem: Addresses the critical challenge of limited labeled data in medical imaging - Comprehensive Evaluation: Thorough experiments across 6 datasets, multiple backbones, and various data scales - Surprising Finding: The observation that adapters can hurt performance under extreme data scarcity is counter-intuitive and valuable - Strong Empirical Results: DKA shows consistent improvements, particularly in low-data settings - Extensive Ablations: Thorough analysis of design choices
* Limited Technical Novelty: The solution essentially adds large-kernel convolutions to adapters - this is a straightforward extension rather than a fundamental innovation * ERF Analysis Concerns: * The causal relationship between ERF and performance is assumed but not proven * ERF computation methodology needs clarification * Alternative explanations (e.g., optimization difficulties, overfitting) are not thoroughly explored * Kernel Size Selection: * The choice of 51×51 kernels seem
The empirical findings are compelling and well-supported: quantitative metrics (ACC, mIoU, Dice) improve across all data scales, especially ≤ 1.25%. The ERF analysis provides strong evidence linking reduced receptive field to Adapter degradation and DKA’s advantage. The controlled parameter-count experiment convincingly isolates kernel size as the key factor.
The theoretical reasoning behind ERF–generalization linkage could be formalized further. Computational overhead of large-kernel depthwise convolutions is not fully quantified. Limited theoretical depth: lacks analytical characterization of why ERF → generalization scaling behaves linearly with data size. Compute trade-off: large-kernel (51×51) convolutions increase FLOPs; energy/memory costs are not discussed.
1. The work is well motivated with the analysis demonstrating the reduction in performance and tying that to a reduction in the effective receptive field with low data settings. To address this problem a simple large kernel convolution based dual kernel adapter is proposed. 2. The results use multiple pretrained models, a large number of baseline methods and clearly demonstrates that the proposed method works well. 3. The proposed method itself is extremely simple but the novelty lies in it uniq
1. There is seemingly the idea of reordering the tokens back into the spatial domain that is a part of the method design. However, the authors make no mention of this in the methods section. 2. The work bears similarities to another large kernel adapter method [1] and seems to differ methodologically owing solely to a dual path convolution and the analysis in the first part of the paper. In fact, there also is an extremely similar analysis in [1] titled "Large Kernel Matters Instead of #Trainabl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
