Unsupervised Data Selection for Supervised Learning
Gabriele Valvano, Andrea Leo, Daniele Della Latta, Nicola Martini,, Gianmarco Santini, Dante Chiappino, Emiliano Ricciardi

TL;DR
This paper explores the potential of unsupervised data selection to improve supervised learning models, hypothesizing that better data quality can enhance generalization, though initial results are inconclusive.
Contribution
It introduces the idea of unsupervised data selection for supervised learning, proposing a methodological approach to improve data quality and model generalization.
Findings
Preliminary results are not robust.
Unsupervised data selection may enhance model generalization.
Further research is needed to validate the approach.
Abstract
Recent research put a big effort in the development of deep learning architectures and optimizers obtaining impressive results in areas ranging from vision to language processing. However little attention has been addressed to the need of a methodological process of data collection. In this work we hypothesize that high quality data for supervised learning can be selected in an unsupervised manner and that by doing so one can obtain models capable to generalize better than in the case of random training set construction. However, preliminary results are not robust and further studies on the subject should be carried out.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Face and Expression Recognition
