Craft: Cross-modal Aligned Features Improve Robustness of Prompt Tuning

Jingchen Sun; Rohan Sharma; Vishnu Suresh Lokhande; Changyou Chen

arXiv:2407.15894·cs.CV·December 23, 2024

Craft: Cross-modal Aligned Features Improve Robustness of Prompt Tuning

Jingchen Sun, Rohan Sharma, Vishnu Suresh Lokhande, Changyou Chen

PDF

Open Access 1 Repo

TL;DR

Craft introduces a cross-modal feature alignment method that enhances prompt tuning robustness in vision-language models by reducing overfitting and domain shift, leading to improved generalization across various tasks.

Contribution

The paper proposes a novel Cross-modal Aligned Feature Tuning (Craft) method that aligns text and image features to improve prompt tuning robustness and out-of-distribution performance.

Findings

01

Up to 6.1% improvement in Base-to-Novel generalization

02

Up to 5.8% improvement in group robustness

03

Up to 2.7% improvement in out-of-distribution tasks

Abstract

Prompt Tuning has emerged as a prominent research paradigm for adapting vision-language models to various downstream tasks. However, recent research indicates that prompt tuning methods often lead to overfitting due to limited training samples. In this paper, we propose a Cross-modal Aligned Feature Tuning (Craft) method to address this issue. Cross-modal alignment is conducted by first selecting anchors from the alternative domain and deriving relative representations of the embeddings for the selected anchors. Optimizing for a feature alignment loss over anchor-aligned text and image modalities creates a more unified text-image common space. Overfitting in prompt tuning also deteriorates model performance on out-of-distribution samples. To further improve the prompt model's robustness, we propose minimizing Maximum Mean Discrepancy (MMD) over the anchor-aligned feature spaces to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jingchensun/craft
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization · Robot Manipulation and Learning