BiPrompt: Bilateral Prompt Optimization for Visual and Textual Debiasing in Vision-Language Models

Sunny Gupta; Shounak Das; Amit Sethi

arXiv:2601.02147·cs.CV·January 6, 2026

BiPrompt: Bilateral Prompt Optimization for Visual and Textual Debiasing in Vision-Language Models

Sunny Gupta, Shounak Das, Amit Sethi

PDF

Open Access

TL;DR

BiPrompt introduces a bilateral prompt optimization method that simultaneously debiases visual and textual modalities in vision-language models, improving robustness and domain invariance without retraining.

Contribution

It proposes a novel framework combining attention-guided erasure and balanced prompt normalization for effective test-time debiasing in both modalities.

Findings

01

Consistent accuracy improvements on real-world and synthetic benchmarks.

02

Enhanced robustness against spurious correlations and distribution shifts.

03

Achieves domain-invariant reasoning without retraining.

Abstract

Vision language foundation models such as CLIP exhibit impressive zero-shot generalization yet remain vulnerable to spurious correlations across visual and textual modalities. Existing debiasing approaches often address a single modality either visual or textual leading to partial robustness and unstable adaptation under distribution shifts. We propose a bilateral prompt optimization framework (BiPrompt) that simultaneously mitigates non-causal feature reliance in both modalities during test-time adaptation. On the visual side, it employs structured attention-guided erasure to suppress background activations and enforce orthogonal prediction consistency between causal and spurious regions. On the textual side, it introduces balanced prompt normalization, a learnable re-centering mechanism that aligns class embeddings toward an isotropic semantic space. Together, these modules jointly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis