On the intrinsic robustness to noise of some leading classifiers and symmetric loss function -- an empirical evaluation
Hugo Le Baher (1), Vincent Lemaire (2), Romain Trinquart (2) ((1), Polytech Nantes (France), (2) Orange Labs (France))

TL;DR
This paper empirically evaluates the intrinsic robustness of leading classifiers like SVM, logistic regression, and ensemble methods against noisy labels, and explores the potential benefits of symmetric loss functions to improve robustness.
Contribution
It introduces a benchmark for assessing classifier robustness to label noise and investigates how symmetric loss functions can enhance this robustness.
Findings
SVM and random forests show high robustness to label noise.
Symmetric loss functions can improve classifier resilience to noisy labels.
Benchmark results highlight differences in robustness among various algorithms.
Abstract
In some industrial applications such as fraud detection, the performance of common supervision techniques may be affected by the poor quality of the available labels : in actual operational use-cases, these labels may be weak in quantity, quality or trustworthiness. We propose a benchmark to evaluate the natural robustness of different algorithms taken from various paradigms on artificially corrupted datasets, with a focus on noisy labels. This paper studies the intrinsic robustness of some leading classifiers. The algorithms under scrutiny include SVM, logistic regression, random forests, XGBoost, Khiops. Furthermore, building on results from recent literature, the study is supplemented with an investigation into the opportunity to enhance some algorithms with symmetric loss functions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Advanced Statistical Methods and Models · Machine Learning and Data Classification
MethodsSupport Vector Machine
