Unlearning the Noisy Correspondence Makes CLIP More Robust

Haochen Han; Alex Jinpeng Wang; Peijun Ye; Fangming Liu

arXiv:2507.03434·cs.CV·July 8, 2025

Unlearning the Noisy Correspondence Makes CLIP More Robust

Haochen Han, Alex Jinpeng Wang, Peijun Ye, Fangming Liu

PDF

TL;DR

This paper introduces NCU, a fine-tuning method that unlearns noisy correspondences in pre-trained VLMs like CLIP, improving robustness and zero-shot transfer performance efficiently.

Contribution

The paper proposes a novel unlearning framework that directly targets noisy data in pre-trained models, reducing computational costs compared to previous refinement methods.

Findings

01

NCU improves CLIP's robustness across multiple tasks.

02

NCU outperforms existing robust pre-training methods in zero-shot transfer.

03

NCU achieves these gains with lower computational overhead.

Abstract

The data appetite for Vision-Language Models (VLMs) has continuously scaled up from the early millions to billions today, which faces an untenable trade-off with data quality and inevitably introduces Noisy Correspondence (NC) samples. Undoubtedly, such semantically unrelated data significantly impairs the performance of VLMs. Previous efforts mainly address this challenge by estimating refined alignment for more precise guidance. However, such resource-intensive pipelines that train VLMs from scratch struggle to meet realistic data demands. In this paper, we present a brand new perspective that seeks to directly eliminate the harmful effects of NC in pre-trained VLMs. Specifically, we propose NCU, a Noisy Correspondence Unlearning fine-tuning framework that efficiently enhances VLMs' robustness by forgetting learned noisy knowledge. The key to NCU is learning the hardest negative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.