Visual Fourier Prompt Tuning
Runjia Zeng, Cheng Han, Qifan Wang, Chunshu Wu, Tong Geng, Lifu Huang,, Ying Nian Wu, Dongfang Liu

TL;DR
Visual Fourier Prompt Tuning (VFPT) introduces a Fourier-based prompt method inspired by human vision to improve parameter-efficient fine-tuning of large vision models, especially across diverse datasets.
Contribution
VFPT innovatively integrates Fast Fourier Transform into prompt embeddings, addressing dataset disparity challenges in PEFT for vision transformers.
Findings
Outperforms state-of-the-art baselines on benchmarks.
Uses only 0.57% of model parameters.
Achieves 73.20% mean accuracy on VTAB-1k.
Abstract
With the scale of vision Transformer-based models continuing to grow, finetuning these large-scale pretrained models for new tasks has become increasingly parameter-intensive. Visual prompt tuning is introduced as a parameter-efficient finetuning (PEFT) method to this trend. Despite its successes, a notable research challenge persists within almost all PEFT approaches: significant performance degradation is observed when there is a substantial disparity between the datasets applied in pretraining and finetuning phases. To address this challenge, we draw inspiration from human visual cognition, and propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models. Our approach innovatively incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Computer Graphics and Visualization Techniques · Advanced Optical Imaging Technologies
