5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual   Recognition Tasks

Dongshuo Yin; Leiyi Hu; Bin Li; Youqun Zhang; Xue Yang

arXiv:2408.08345·cs.CV·August 28, 2024·3 cites

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li, Youqun Zhang, Xue Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Mona, a novel adapter-based tuning method that surpasses full fine-tuning in various visual recognition tasks by enhancing visual signal processing and feature regulation.

Contribution

Mona employs multiple vision-friendly filters and scaled normalization layers, providing a more effective alternative to full fine-tuning for diverse visual tasks.

Findings

01

Mona outperforms full fine-tuning on all tested tasks.

02

Achieves 1% performance gain on COCO dataset.

03

Demonstrates broad applicability across segmentation, detection, and classification.

Abstract

Pre-training & fine-tuning can enhance the transferring efficiency and performance in visual tasks. Recent delta-tuning methods provide more options for visual classification tasks. Despite their success, existing visual delta-tuning art fails to exceed the upper limit of full fine-tuning on challenging tasks like object detection and segmentation. To find a competitive alternative to full fine-tuning, we propose the Multi-cognitive Visual Adapter (Mona) tuning, a novel adapter-based tuning method. First, we introduce multiple vision-friendly filters into the adapter to enhance its ability to process visual signals, while previous methods mainly rely on language-friendly linear filters. Second, we add the scaled normalization layer in the adapter to regulate the distribution of input features for visual filters. To fully demonstrate the practicality and generality of Mona, we conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leiyi-hu/mona
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsAdapter