BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning
Yi Zhang, Ce Zhang, Zihan Liao, Yushun Tang, Zhihai He

TL;DR
This paper introduces BDC-Adapter, a novel fine-tuning method for vision-language models that uses Brownian Distance Covariance to better capture complex feature relations, significantly improving classification performance.
Contribution
It pioneers the use of Brownian Distance Covariance in vision-language reasoning, enabling modeling of all types of feature relations for improved fine-tuning.
Findings
Outperforms state-of-the-art methods by large margins
Handles non-linear feature relations effectively
Provides a robust measure of feature dependence
Abstract
Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and ALIGN, have introduced a new paradigm for learning transferable visual representations. Recently, there has been a surge of interest among researchers in developing lightweight fine-tuning techniques to adapt these models to downstream visual tasks. We recognize that current state-of-the-art fine-tuning methods, such as Tip-Adapter, simply consider the covariance between the query image feature and features of support few-shot training samples, which only captures linear relations and potentially instigates a deceptive perception of independence. To address this issue, in this work, we innovatively introduce Brownian Distance Covariance (BDC) to the field of vision-language reasoning. The BDC metric can model all possible relations, providing a robust metric for measuring feature dependence. Based on this, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Digital Imaging for Blood Diseases
MethodsContrastive Language-Image Pre-training · ALIGN
