Extracting Clean and Balanced Subset for Noisy Long-tailed   Classification

Zhuo Li; He Zhao; Zhen Li; Tongliang Liu; Dandan Guo; Xiang Wan

arXiv:2404.06795·cs.LG·April 11, 2024·1 cites

Extracting Clean and Balanced Subset for Noisy Long-tailed Classification

Zhuo Li, He Zhao, Zhen Li, Tongliang Liu, Dandan Guo, Xiang Wan

PDF

Open Access

TL;DR

This paper proposes a novel pseudo labeling approach using optimal transport to create a clean, balanced subset from noisy, long-tailed datasets, improving classification performance.

Contribution

It introduces a distribution-matching pseudo labeling method with optimal transport, effectively handling label noise and class imbalance simultaneously.

Findings

01

Achieves better class balance and label cleanliness in subsets.

02

Improves long-tailed classification accuracy with noisy labels.

03

Demonstrates effectiveness through extensive experiments.

Abstract

Real-world datasets usually are class-imbalanced and corrupted by label noise. To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples. Despite their effectiveness, they may be limited in handling the joint issue effectively in a unified way. In this work, we develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching, which can be solved with optimal transport (OT). By setting a manually-specific probability measure and using a learned transport plan to pseudo-label the training samples, the proposed method can reduce the side-effects of noisy and long-tailed data simultaneously. Then we introduce a simple yet effective filter criteria by combining the observed labels and pseudo labels to obtain a more balanced and less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and Data Classification · Face and Expression Recognition