Robust Noisy Correspondence Learning via Self-Drop and Dual-Weight

Fan Liu; Chenwei Dong; Chuanyi Zhang; Hualiang Zhou; Jun Zhou

arXiv:2412.06172·cs.CV·December 12, 2024

Robust Noisy Correspondence Learning via Self-Drop and Dual-Weight

Fan Liu, Chenwei Dong, Chuanyi Zhang, Hualiang Zhou, Jun Zhou

PDF

Open Access

TL;DR

This paper introduces a novel self-drop and dual-weight method for robustly learning from noisy cross-modal data, effectively reducing the impact of noisy pairs and emphasizing significant clean samples, leading to improved stability and performance.

Contribution

The paper proposes a new data partitioning and weighting strategy that enhances robustness against noisy correspondences in cross-modal learning tasks.

Findings

01

The approach outperforms prior methods on noisy datasets.

02

It maintains stable performance under high noise ratios.

03

Effective in vision-language pre-training scenarios.

Abstract

Many researchers collect data from the internet through crowd-sourcing or web crawling to alleviate the data-hungry challenge associated with cross-modal matching. Although such practice does not require expensive annotations, it inevitably introduces mismatched pairs and results in a noisy correspondence problem. Current approaches leverage the memorization effect of deep neural networks to distinguish noise and perform re-weighting. However, briefly lowering the weight of noisy pairs cannot eliminate the negative impact of noisy correspondence in the training process. In this paper, we propose a novel self-drop and dual-weight approach, which achieves elaborate data processing by qua-partitioning the data. Specifically, our approach partitions all data into four types: clean and significant, clean yet insignificant, vague, and noisy. We analyze the effect of noisy and clean data pairs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Speech Recognition and Synthesis

MethodsADaptive gradient method with the OPTimal convergence rate