LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
Can Jin, Ying Li, Mingyu Zhao, Shiyu Zhao, Zhenting Wang, Xiaoxiao He,, Ligong Han, Tong Che, Dimitris N. Metaxas

TL;DR
LoR-VP introduces a low-rank visual prompting method that enhances the efficiency and performance of adapting pre-trained vision models by enabling shared and patch-specific information, significantly reducing parameters and training time.
Contribution
The paper proposes a novel low-rank visual prompting technique that addresses limitations of existing methods, improving both efficiency and effectiveness in model adaptation.
Findings
Up to 6 times faster training
Utilizes 18 times fewer prompt parameters
Achieves 3.1% performance improvement
Abstract
Visual prompting has gained popularity as a method for adapting pre-trained models to specific tasks, particularly in the realm of parameter-efficient tuning. However, existing visual prompting techniques often pad the prompt parameters around the image, limiting the interaction between the visual prompts and the original image to a small set of patches while neglecting the inductive bias present in shared information across different patches. In this study, we conduct a thorough preliminary investigation to identify and address these limitations. We propose a novel visual prompt design, introducing Low-Rank matrix multiplication for Visual Prompting (LoR-VP), which enables shared and patch-specific information across rows and columns of image pixels. Extensive experiments across seven network architectures and four datasets demonstrate significant improvements in both performance and…
Peer Reviews
Decision·ICLR 2025 Poster
- The paper identifies the limitations of existing visual prompting techniques, which often restrict interaction between visual prompts and the original image to a small set of patches. - A novel visual prompt design based on low-rank matrix multiplication is proposed. This design allows for shared and patch-specific information across rows and columns of image pixels. - The results are convincing and demonstrate performance and efficiency improvements. The authors include extensive experiments
- A low-rank matrix multiplication is just one way of sharing information across patches. I am surprised that other approaches have not been tested. - The conclusions in the paper are very superficial and do not offer a deeper insight into the experimental results and the strengths and weaknesses of the proposed approach.
1. the paper is very well-written and easy to follow. The background part is very clear and detailed. 2. the method is simple yet effect. And the author gives good preliminary analysis that motivates the method, which makes a lot of sense to me.
1. Regarding the pad-based method, the authors state, "The VP parameters are restricted to interacting with the original image in a limited set of patches, leaving a substantial portion of the image unmodified." I find this assertion questionable. If the backbone is a ViT, the padded tokens will interact with the inner tokens through self-attention, potentially affecting the entire image. 2. The authors do not include a comparison or discussion of visual prompt tuning [1], which is a more preval
Organic integration of LoRA and VP: The originality and novelty of this paper lie in its clever integration of Low-Rank Adaptation (LoRA) with Visual Prompting (VP), two previously established concepts, to create a highly efficient and effective approach for adapting pre-trained vision models. While LoRA has been used to reduce the complexity of model fine-tuning, and Visual Prompting focuses on task-specific adaptation through input modification, the paper's innovation is in combining these met
The preliminary study lacks rigor in controlling variables, as the impact of image scaling is not isolated from the role of patch-specific information. While design 4 (Patch-Same) outperforms others, the study does not definitively clarify whether its success is due to shared prompting across patches or the fact that the image is not scaled, leaving ambiguity about the true cause of the performance improvement. This undermines the ability to attribute the gains solely to patch sharing. In the m
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Image Processing Techniques and Applications
MethodsSparse Evolutionary Training
