Topology-Aware CLIP Few-Shot Learning

Dazhi Huang

arXiv:2505.01694·cs.CV·May 6, 2025

Topology-Aware CLIP Few-Shot Learning

Dazhi Huang

PDF

Open Access

TL;DR

This paper introduces a topology-aware tuning method for CLIP that aligns the topological structures of visual and textual representations, significantly improving few-shot learning performance across multiple datasets.

Contribution

It presents a novel approach integrating Representation Topology Divergence into the Task Residual framework, explicitly aligning topological structures to enhance few-shot learning.

Findings

01

Achieves 1-2% higher accuracy on 6 benchmark datasets.

02

Effectively leverages topological information for better adaptation.

03

Maintains pre-trained knowledge by freezing base encoders.

Abstract

Efficiently adapting large Vision-Language Models (VLMs) like CLIP for few-shot learning poses challenges in balancing pre-trained knowledge retention and task-specific adaptation. Existing methods often overlook valuable structural information within the VLM's latent space. We introduce a topology-aware tuning approach integrating Representation Topology Divergence (RTD) into the Task Residual (TR) framework. By explicitly aligning the topological structures of visual and text representations using a combined RTD and Cross-Entropy loss, while freezing base VLM encoders, our method enhances few-shot performance. We optimize only lightweight Task Residual parameters, effectively leveraging topological information. Across 6 diverse benchmark datasets, our approach demonstrates significant gains, achieving an average accuracy improvement of 1-2\% over relevant baseline methods in few-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeophysical Methods and Applications

MethodsBalanced Selection · Contrastive Language-Image Pre-training