Inconsistency Ranking-based Noisy Label Detection for High-quality Data

Ruibin Yuan; Hanzhi Yin; Yi Wang; Yifan He; Yushi Ye; Lei Zhang,; Zhizheng Wu

arXiv:2212.00239·cs.CL·June 16, 2023·1 cites

Inconsistency Ranking-based Noisy Label Detection for High-quality Data

Ruibin Yuan, Hanzhi Yin, Yi Wang, Yifan He, Yushi Ye, Lei Zhang,, Zhizheng Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces an automatic noisy label detection method based on inconsistency ranking, aimed at improving data quality for deep learning, demonstrated on speaker verification tasks with promising results.

Contribution

It proposes a novel inconsistency ranking approach for noisy label detection, enhancing data cleaning efficiency and effectiveness in large-scale datasets.

Findings

01

Improves noisy label detection accuracy in speaker verification datasets.

02

Outperforms baseline methods in cleaning efficiency and effectiveness.

03

Effective across various noise levels and metric learning loss functions.

Abstract

The success of deep learning requires high-quality annotated and massive data. However, the size and the quality of a dataset are usually a trade-off in practice, as data collection and cleaning are expensive and time-consuming. In real-world applications, especially those using crowdsourcing datasets, it is important to exclude noisy labels. To address this, this paper proposes an automatic noisy label detection (NLD) technique with inconsistency ranking for high-quality data. We apply this technique to the automatic speaker verification (ASV) task as a proof of concept. We investigate both inter-class and intra-class inconsistency ranking and compare several metric learning loss functions under different noise settings. Experimental results confirm that the proposed solution could increase both the efficient and effective cleaning of large-scale speaker recognition datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

a43992899/noisyspeakerdetection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing