Training Spatial-Frequency Visual Prompts and Probabilistic Clusters for   Accurate Black-Box Transfer Learning

Wonwoo Cho; Kangyeol Kim; Saemee Choi; Jaegul Choo

arXiv:2408.07944·cs.CV·August 16, 2024

Training Spatial-Frequency Visual Prompts and Probabilistic Clusters for Accurate Black-Box Transfer Learning

Wonwoo Cho, Kangyeol Kim, Saemee Choi, Jaegul Choo

PDF

TL;DR

This paper introduces a parameter-efficient transfer learning framework for black-box vision models that uses spatial-frequency visual prompts and probabilistic clusters to improve accuracy and reduce computational costs in few-shot scenarios.

Contribution

It proposes a novel training framework combining spatial-frequency prompts and probabilistic clustering for effective black-box transfer learning in vision tasks.

Findings

01

Outperforms state-of-the-art baselines in few-shot transfer learning.

02

Reduces computational costs during training and inference.

03

Enhances class separation in output space.

Abstract

Despite the growing prevalence of black-box pre-trained models (PTMs) such as prediction API services, there remains a significant challenge in directly applying general models to real-world scenarios due to the data distribution gap. Considering a data deficiency and constrained computational resource scenario, this paper proposes a novel parameter-efficient transfer learning framework for vision recognition models in the black-box setting. Our framework incorporates two novel training techniques. First, we align the input space (i.e., image) of PTMs to the target data distribution by generating visual prompts of spatial and frequency domain. Along with the novel spatial-frequency hybrid visual prompter, we design a novel training technique based on probabilistic clusters, which can enhance class separation in the output space (i.e., prediction probabilities). In experiments, our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN