Ghost Loss to Question the Reliability of Training Data
Adrien Deli\`ege, Anthony Cioppa, Marc Van Droogenbroeck

TL;DR
This paper introduces the ghost loss, a novel training method that allows neural networks to identify and handle mislabeled or confusing images in datasets, challenging the assumption of perfect annotation quality.
Contribution
The paper proposes the ghost loss concept, enabling models to detect and account for label inconsistencies during training, improving dataset reliability assessment.
Findings
Ghost loss effectively detects confusing images.
Application to datasets reveals mislabeled data.
Provides a new tool called sanity matrix.
Abstract
Supervised image classification problems rely on training data assumed to have been correctly annotated; this assumption underpins most works in the field of deep learning. In consequence, during its training, a network is forced to match the label provided by the annotator and is not given the flexibility to choose an alternative to inconsistencies that it might be able to detect. Therefore, erroneously labeled training images may end up ``correctly'' classified in classes which they do not actually belong to. This may reduce the performances of the network and thus incite to build more complex networks without even checking the quality of the training data. In this work, we question the reliability of the annotated datasets. For that purpose, we introduce the notion of ghost loss, which can be seen as a regular loss that is zeroed out for some predicted values in a deterministic way…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
