BDC-Adapter: Brownian Distance Covariance for Better Vision-Language   Reasoning

Yi Zhang; Ce Zhang; Zihan Liao; Yushun Tang; Zhihai He

arXiv:2309.01256·cs.CV·September 6, 2023·2 cites

BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

Yi Zhang, Ce Zhang, Zihan Liao, Yushun Tang, Zhihai He

PDF

Open Access

TL;DR

This paper introduces BDC-Adapter, a novel fine-tuning method for vision-language models that uses Brownian Distance Covariance to better capture complex feature relations, significantly improving classification performance.

Contribution

It pioneers the use of Brownian Distance Covariance in vision-language reasoning, enabling modeling of all types of feature relations for improved fine-tuning.

Findings

01

Outperforms state-of-the-art methods by large margins

02

Handles non-linear feature relations effectively

03

Provides a robust measure of feature dependence

Abstract

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and ALIGN, have introduced a new paradigm for learning transferable visual representations. Recently, there has been a surge of interest among researchers in developing lightweight fine-tuning techniques to adapt these models to downstream visual tasks. We recognize that current state-of-the-art fine-tuning methods, such as Tip-Adapter, simply consider the covariance between the query image feature and features of support few-shot training samples, which only captures linear relations and potentially instigates a deceptive perception of independence. To address this issue, in this work, we innovatively introduce Brownian Distance Covariance (BDC) to the field of vision-language reasoning. The BDC metric can model all possible relations, providing a robust metric for measuring feature dependence. Based on this, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Digital Imaging for Blood Diseases

MethodsContrastive Language-Image Pre-training · ALIGN