Robust Adaptation of Foundation Models with Black-Box Visual Prompting

Changdae Oh; Gyeongdeok Seo; Geunyoung Jung; Zhi-Qi Cheng; Hosik Choi; Jiyoung Jung; Kyungwoo Song

arXiv:2407.17491·cs.CV·April 7, 2026

Robust Adaptation of Foundation Models with Black-Box Visual Prompting

Changdae Oh, Gyeongdeok Seo, Geunyoung Jung, Zhi-Qi Cheng, Hosik Choi, Jiyoung Jung, Kyungwoo Song

PDF

TL;DR

This paper introduces BlackVIP, a black-box visual prompting method that adapts large pre-trained models without access to their parameters, using efficient gradient estimation and prompting strategies, suitable for real-world applications.

Contribution

BlackVIP is the first method to adapt large models as black boxes using visual prompts, with a novel SPSA-GC gradient estimation and a cost-effective variant, BlackVIP-SE.

Findings

01

BlackVIP achieves robust adaptation across 19 datasets.

02

BlackVIP requires minimal memory and computational resources.

03

BlackVIP improves robustness linked to certified smoothing.

Abstract

With a surge of large-scale pre-trained models, parameter-efficient transfer learning (PETL) of large models has garnered significant attention. While promising, they commonly rely on two optimistic assumptions: 1) full access to the parameters of a PTM, and 2) sufficient memory capacity to cache all intermediate activations for gradient computation. However, in most real-world applications, PTMs serve as black-box APIs or proprietary software without full parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. This work proposes black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge of their architectures or parameters. BlackVIP has two components: 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent visual prompts,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.