Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning

Tianjiao Jiang; Zhen Zhang; Yuhang Liu; Javen Qinfeng Shi

arXiv:2508.03102·cs.CV·August 6, 2025

Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning

Tianjiao Jiang, Zhen Zhang, Yuhang Liu, Javen Qinfeng Shi

PDF

TL;DR

The paper introduces Causal CLIP Adapter (CCA), a novel framework that disentangles visual features and enhances cross-modal alignment to improve few-shot learning performance with limited data.

Contribution

It proposes a new method combining unsupervised ICA for disentanglement and cross-modal alignment techniques to advance few-shot learning.

Findings

01

Outperforms state-of-the-art in 11 benchmarks

02

Improves robustness to distributional shifts

03

Maintains computational efficiency

Abstract

Few-shot learning (FSL) often requires effective adaptation of models using limited labeled data. However, most existing FSL methods rely on entangled representations, requiring the model to implicitly recover the unmixing process to obtain disentangled representations using only limited supervision, which hinders effective adaptation. Recent theoretical studies show that multimodal contrastive learning methods, such as CLIP, can disentangle latent representations up to linear transformations. In light of this, we propose the Causal CLIP Adapter (CCA), a novel framework that explicitly disentangles visual features extracted from CLIP using unsupervised Independent Component Analysis (ICA). This removes the need to learn the unmixing process from the labeled data, thereby reducing the number of trainable parameters and mitigating overfitting. Taking a step further, while ICA can obtain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.