Adapting to Distribution Shift by Visual Domain Prompt Generation
Zhixiang Chi, Li Gu, Tao Zhong, Huan Liu, Yuanhao Yu, Konstantinos N, Plataniotis, Yang Wang

TL;DR
This paper introduces a novel test-time adaptation method that leverages foundation models, a knowledge bank, and domain prompts to improve out-of-distribution generalization under distribution shifts.
Contribution
It proposes a domain prompt generation approach conditioned on few-shot target data, utilizing a knowledge bank and meta-learning to enhance domain adaptation from pre-trained features.
Findings
Outperforms previous methods on 5 large-scale benchmarks.
Effective domain knowledge extraction via domain-aware contrastive loss.
Utilizes foundation models and a knowledge bank for improved adaptation.
Abstract
In this paper, we aim to adapt a model at test-time using a few unlabeled data to address distribution shifts. To tackle the challenges of extracting domain knowledge from a limited amount of data, it is crucial to utilize correlated information from pre-trained backbones and source domains. Previous studies fail to utilize recent foundation models with strong out-of-distribution generalization. Additionally, domain-centric designs are not flavored in their works. Furthermore, they employ the process of modelling source domains and the process of learning to adapt independently into disjoint training stages. In this work, we propose an approach on top of the pre-computed features of the foundation model. Specifically, we build a knowledge bank to learn the transferable knowledge from source domains. Conditioned on few-shot target data, we introduce a domain prompt generator to condense…
Peer Reviews
Decision·ICLR 2024 poster
- VDPG not only outperforms other methods on the WILDS benchmark, but it specifically showcases superior results on individual datasets like iWildCam, Camelyon17, and FMoW. - VDPG's ability to generate high-quality domain-specific prompts tailored to each target domain is not just a novel approach, but one that proves effective - When compared to methods like FYLP and DoPrompt, VDPG is designed for quicker adaptation and inference, making it a more feasible choice for real-world applications whe
- The results show that ERM training alone cannot drive performance. Specific configurations, such as episodic learning, are necessary to boost performance. This indicates a complexity in training dynamics that might be challenging to replicate or optimize in varied settings. - The method adapts with few-shot data before making inferences on all target data. While this is computationally efficient, there's a potential risk of overfitting or being overly reliant on a limited subset of data.
1. Well-written and easy to follow 2. Interesting approach to leverage CLIP knowledge for performing few-shot TTA. The different novel components are combined together in a non-trivial manner to attain the SoTA performance. 3. Extensive ablations on knowledge bank, prompt generation, losses and evaluation on the standard benchmarks are provided.
1. How low can KB size be? or how much sensitivity to Z value we can have?
- The studied task, Few-Shot Test-Time Domain Adaptation, is important yet challenging. Although there are many works focusing on parameter efficient learning with few-shot data, how to extract useful domain knowledge is an interesting perspect that deserves more research. - The authors design several modules including transferable knowledge bank, conditional domain prompt generator and domain-aware contrastive loss and domain guidance module. Overall, the motivation of each module is clear. In
- The method is a bit complex. The loss term in Eq. (5) consists of three terms that need to be properly balanced. As those correlation loss and contrastive loss are at different scales, the model sensitivity against such hyper-parameters especially considering the few-shot data is unclear. - Since the VDPG is built upon prompt learning of CLIP, comparison with CNNs backbones is less informative. Methods like ERM, CORAL, MTL can be applied with CLIP. - Fig. 2(a,b) uses a black background, which
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Data Visualization and Analytics · Video Analysis and Summarization
