5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks
Dongshuo Yin, Leiyi Hu, Bin Li, Youqun Zhang, Xue Yang

TL;DR
This paper introduces Mona, a novel adapter-based tuning method that surpasses full fine-tuning in various visual recognition tasks by enhancing visual signal processing and feature regulation.
Contribution
Mona employs multiple vision-friendly filters and scaled normalization layers, providing a more effective alternative to full fine-tuning for diverse visual tasks.
Findings
Mona outperforms full fine-tuning on all tested tasks.
Achieves 1% performance gain on COCO dataset.
Demonstrates broad applicability across segmentation, detection, and classification.
Abstract
Pre-training & fine-tuning can enhance the transferring efficiency and performance in visual tasks. Recent delta-tuning methods provide more options for visual classification tasks. Despite their success, existing visual delta-tuning art fails to exceed the upper limit of full fine-tuning on challenging tasks like object detection and segmentation. To find a competitive alternative to full fine-tuning, we propose the Multi-cognitive Visual Adapter (Mona) tuning, a novel adapter-based tuning method. First, we introduce multiple vision-friendly filters into the adapter to enhance its ability to process visual signals, while previous methods mainly rely on language-friendly linear filters. Second, we add the scaled normalization layer in the adapter to regulate the distribution of input features for visual filters. To fully demonstrate the practicality and generality of Mona, we conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsAdapter
