Semi-Supervised Cascaded Clustering for Classification of Noisy Label Data
Ashit Gupta, Anirudh Deodhar, Tathagata Mukherjee, Venkataramana, Runkana

TL;DR
This paper introduces a semi-supervised cascaded clustering algorithm that effectively handles noisy labels in limited datasets, improving classification accuracy without extensive human intervention, especially suited for industrial applications.
Contribution
The paper presents a novel SSCC algorithm with a cluster evaluation matrix for noise reduction, reducing reliance on deep neural networks and human expertise in noisy, limited datasets.
Findings
Outperforms SVM on noisy datasets
Effectively identifies and eliminates noisy labels
Produces accurate classifiers with minimal human input
Abstract
The performance of supervised classification techniques often deteriorates when the data has noisy labels. Even the semi-supervised classification approaches have largely focused only on the problem of handling missing labels. Most of the approaches addressing the noisy label data rely on deep neural networks (DNN) that require huge datasets for classification tasks. This poses a serious challenge especially in process and manufacturing industries, where the data is limited and labels are noisy. We propose a semi-supervised cascaded clustering (SSCC) algorithm to extract patterns and generate a cascaded tree of classes in such datasets. A novel cluster evaluation matrix (CEM) with configurable hyperparameters is introduced to localize and eliminate the noisy labels and invoke a pruning criterion on cascaded clustering. The algorithm reduces the dependency on expensive human expertise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Water Systems and Optimization · Text and Document Classification Technologies
MethodsPruning
