Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation
Tianxiang Hao, Hui Chen, Yuchen Guo, Guiguang Ding

TL;DR
The paper introduces Consolidator, a parameter-efficient method for adapting vision transformers to downstream tasks by adding a small, tunable module with grouped connections, achieving high accuracy with minimal parameters.
Contribution
It proposes a novel consolidator module using grouped connections for efficient knowledge transfer in vision transformers, outperforming existing parameter-efficient tuning methods.
Findings
Achieves up to 7.56% better accuracy than full fine-tuning.
Uses only 0.35% of parameters compared to full fine-tuning.
Outperforms state-of-the-art parameter-efficient tuning methods.
Abstract
Recently, transformers have shown strong ability as visual feature extractors, surpassing traditional convolution-based models in various scenarios. However, the success of vision transformers largely owes to their capacity to accommodate numerous parameters. As a result, new challenges for adapting large models to downstream tasks arise. On the one hand, classic fine-tuning tunes all parameters in a huge model for every task and thus easily falls into overfitting, leading to inferior performance. On the other hand, on resource-limited devices, fine-tuning stores a full copy of parameters and thus is usually impracticable for the shortage of storage space. However, few works have focused on how to efficiently and effectively transfer knowledge in a vision transformer. Existing methods did not dive into the properties of visual features, leading to inferior performance. Moreover, some of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
