Visual Fourier Prompt Tuning

Runjia Zeng; Cheng Han; Qifan Wang; Chunshu Wu; Tong Geng; Lifu Huang,; Ying Nian Wu; Dongfang Liu

arXiv:2411.01327·cs.CV·November 19, 2024·3 cites

Visual Fourier Prompt Tuning

Runjia Zeng, Cheng Han, Qifan Wang, Chunshu Wu, Tong Geng, Lifu Huang,, Ying Nian Wu, Dongfang Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

Visual Fourier Prompt Tuning (VFPT) introduces a Fourier-based prompt method inspired by human vision to improve parameter-efficient fine-tuning of large vision models, especially across diverse datasets.

Contribution

VFPT innovatively integrates Fast Fourier Transform into prompt embeddings, addressing dataset disparity challenges in PEFT for vision transformers.

Findings

01

Outperforms state-of-the-art baselines on benchmarks.

02

Uses only 0.57% of model parameters.

03

Achieves 73.20% mean accuracy on VTAB-1k.

Abstract

With the scale of vision Transformer-based models continuing to grow, finetuning these large-scale pretrained models for new tasks has become increasingly parameter-intensive. Visual prompt tuning is introduced as a parameter-efficient finetuning (PEFT) method to this trend. Despite its successes, a notable research challenge persists within almost all PEFT approaches: significant performance degradation is observed when there is a substantial disparity between the datasets applied in pretraining and finetuning phases. To address this challenge, we draw inspiration from human visual cognition, and propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models. Our approach innovatively incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

runtsang/vfpt
pytorchOfficial

Videos

Visual Fourier Prompt Tuning· slideslive

Taxonomy

TopicsNeural Networks and Applications · Computer Graphics and Visualization Techniques · Advanced Optical Imaging Technologies