Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

Yiting Yang; Hao Luo; Yuan Sun; Qingsen Yan; Haokui Zhang; Wei Dong; Guoqing Wang; Peng Wang; Yang Yang; Hengtao Shen

arXiv:2507.13260·cs.CV·July 18, 2025

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

Yiting Yang, Hao Luo, Yuan Sun, Qingsen Yan, Haokui Zhang, Wei Dong, Guoqing Wang, Peng Wang, Yang Yang, Hengtao Shen

PDF

Open Access

TL;DR

This paper introduces an Approximately Orthogonal Fine-Tuning (AOFT) strategy for Vision Transformers that enhances generalization by aligning the properties of low-rank adaptation matrices with the pre-trained backbone.

Contribution

The paper proposes a novel AOFT method that uses a single learnable vector to generate approximately orthogonal low-rank matrices, improving fine-tuning performance.

Findings

01

Achieves competitive results on image classification tasks.

02

Enhances model generalization by enforcing approximate orthogonality.

03

Demonstrates the effectiveness of AOFT over existing PEFT methods.

Abstract

A prevalent approach in Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViT) involves freezing the majority of the backbone parameters and solely learning low-rank adaptation weight matrices to accommodate downstream tasks. These low-rank matrices are commonly derived through the multiplication structure of down-projection and up-projection matrices, exemplified by methods such as LoRA and Adapter. In this work, we observe an approximate orthogonality among any two row or column vectors within any weight matrix of the backbone parameters; however, this property is absent in the vectors of the down/up-projection matrices. Approximate orthogonality implies a reduction in the upper bound of the model's generalization error, signifying that the model possesses enhanced generalization capability. If the fine-tuned down/up-projection matrices were to exhibit this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Image Processing Techniques and Applications · Infrared Target Detection Methodologies