Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning
Tianjiao Jiang, Zhen Zhang, Yuhang Liu, Javen Qinfeng Shi

TL;DR
The paper introduces Causal CLIP Adapter (CCA), a novel framework that disentangles visual features and enhances cross-modal alignment to improve few-shot learning performance with limited data.
Contribution
It proposes a new method combining unsupervised ICA for disentanglement and cross-modal alignment techniques to advance few-shot learning.
Findings
Outperforms state-of-the-art in 11 benchmarks
Improves robustness to distributional shifts
Maintains computational efficiency
Abstract
Few-shot learning (FSL) often requires effective adaptation of models using limited labeled data. However, most existing FSL methods rely on entangled representations, requiring the model to implicitly recover the unmixing process to obtain disentangled representations using only limited supervision, which hinders effective adaptation. Recent theoretical studies show that multimodal contrastive learning methods, such as CLIP, can disentangle latent representations up to linear transformations. In light of this, we propose the Causal CLIP Adapter (CCA), a novel framework that explicitly disentangles visual features extracted from CLIP using unsupervised Independent Component Analysis (ICA). This removes the need to learn the unmixing process from the labeled data, thereby reducing the number of trainable parameters and mitigating overfitting. Taking a step further, while ICA can obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
