LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation

Can Jin; Ying Li; Mingyu Zhao; Shiyu Zhao; Zhenting Wang; Xiaoxiao He,; Ligong Han; Tong Che; Dimitris N. Metaxas

arXiv:2502.00896·cs.CV·April 15, 2025

LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation

Can Jin, Ying Li, Mingyu Zhao, Shiyu Zhao, Zhenting Wang, Xiaoxiao He,, Ligong Han, Tong Che, Dimitris N. Metaxas

PDF

Open Access 1 Repo 3 Reviews

TL;DR

LoR-VP introduces a low-rank visual prompting method that enhances the efficiency and performance of adapting pre-trained vision models by enabling shared and patch-specific information, significantly reducing parameters and training time.

Contribution

The paper proposes a novel low-rank visual prompting technique that addresses limitations of existing methods, improving both efficiency and effectiveness in model adaptation.

Findings

01

Up to 6 times faster training

02

Utilizes 18 times fewer prompt parameters

03

Achieves 3.1% performance improvement

Abstract

Visual prompting has gained popularity as a method for adapting pre-trained models to specific tasks, particularly in the realm of parameter-efficient tuning. However, existing visual prompting techniques often pad the prompt parameters around the image, limiting the interaction between the visual prompts and the original image to a small set of patches while neglecting the inductive bias present in shared information across different patches. In this study, we conduct a thorough preliminary investigation to identify and address these limitations. We propose a novel visual prompt design, introducing Low-Rank matrix multiplication for Visual Prompting (LoR-VP), which enables shared and patch-specific information across rows and columns of image pixels. Extensive experiments across seven network architectures and four datasets demonstrate significant improvements in both performance and…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

- The paper identifies the limitations of existing visual prompting techniques, which often restrict interaction between visual prompts and the original image to a small set of patches. - A novel visual prompt design based on low-rank matrix multiplication is proposed. This design allows for shared and patch-specific information across rows and columns of image pixels. - The results are convincing and demonstrate performance and efficiency improvements. The authors include extensive experiments

Weaknesses

- A low-rank matrix multiplication is just one way of sharing information across patches. I am surprised that other approaches have not been tested. - The conclusions in the paper are very superficial and do not offer a deeper insight into the experimental results and the strengths and weaknesses of the proposed approach.

Reviewer 02Rating 6Confidence 3

Strengths

1. the paper is very well-written and easy to follow. The background part is very clear and detailed. 2. the method is simple yet effect. And the author gives good preliminary analysis that motivates the method, which makes a lot of sense to me.

Weaknesses

1. Regarding the pad-based method, the authors state, "The VP parameters are restricted to interacting with the original image in a limited set of patches, leaving a substantial portion of the image unmodified." I find this assertion questionable. If the backbone is a ViT, the padded tokens will interact with the inner tokens through self-attention, potentially affecting the entire image. 2. The authors do not include a comparison or discussion of visual prompt tuning [1], which is a more preval

Reviewer 03Rating 5Confidence 4

Strengths

Organic integration of LoRA and VP: The originality and novelty of this paper lie in its clever integration of Low-Rank Adaptation (LoRA) with Visual Prompting (VP), two previously established concepts, to create a highly efficient and effective approach for adapting pre-trained vision models. While LoRA has been used to reduce the complexity of model fine-tuning, and Visual Prompting focuses on task-specific adaptation through input modification, the paper's innovation is in combining these met

Weaknesses

The preliminary study lacks rigor in controlling variables, as the impact of image scaling is not isolated from the role of patch-specific information. While design 4 (Patch-Same) outperforms others, the study does not definitively clarify whether its success is due to shared prompting across patches or the fact that the image is not scaled, leaving ambiguity about the true cause of the performance improvement. This undermines the ability to attribute the gains solely to patch sharing. In the m

Code & Models

Repositories

jincan333/lor-vp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Image Processing Techniques and Applications

MethodsSparse Evolutionary Training