Aggregate, Decompose, and Fine-Tune: A Simple Yet Effective Factor-Tuning Method for Vision Transformer
Dongping Chen

TL;DR
EFFT is a simple, effective fine-tuning method for Vision Transformers that outperforms existing approaches on VTAB-1K with minimal parameter updates, setting a new state-of-the-art.
Contribution
Introduces EFFT, a novel factor-tuning method that addresses redundancy issues in ViT fine-tuning, achieving superior performance with minimal parameters.
Findings
EFFT achieves 75.9% top-1 accuracy on VTAB-1K.
EFFT uses only 0.28% of parameters for fine-tuning.
EFFT surpasses all baseline methods in experiments.
Abstract
Recent advancements have illuminated the efficacy of some tensorization-decomposition Parameter-Efficient Fine-Tuning methods like LoRA and FacT in the context of Vision Transformers (ViT). However, these methods grapple with the challenges of inadequately addressing inner- and cross-layer redundancy. To tackle this issue, we introduce EFfective Factor-Tuning (EFFT), a simple yet effective fine-tuning method. Within the VTAB-1K dataset, our EFFT surpasses all baselines, attaining state-of-the-art performance with a categorical average of 75.9% in top-1 accuracy with only 0.28% of the parameters for full fine-tuning. Considering the simplicity and efficacy of EFFT, it holds the potential to serve as a foundational benchmark. The code and model are now available at https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Advanced Neural Network Applications · CCD and CMOS Imaging Sensors
