Condensed Representation of Machine Learning Data
Rahman Salim Zengin (1), Volkan Sezer (1) ((1) Istanbul Technical, University)

TL;DR
This paper introduces a novel condensed data representation method for machine learning that reduces redundancy and computational costs while maintaining acceptable accuracy, using K-means clustering with corrections.
Contribution
A new condensed data representation technique combining K-means with correction mechanisms for efficient machine learning training.
Findings
Reduced computational resource utilization.
Maintained acceptable model accuracy.
Effective on synthetically generated data.
Abstract
Training of a Machine Learning model requires sufficient data. The sufficiency of the data is not always about the quantity, but about the relevancy and reduced redundancy. Data-generating processes create massive amounts of data. When used raw, such big data is causing much computational resource utilization. Instead of using the raw data, a proper Condensed Representation can be used instead. Combining K-means, a well-known clustering method, with some correction and refinement facilities a novel Condensed Representation method for Machine Learning applications is introduced. To present the novel method meaningfully and visually, synthetically generated data is employed. It has been shown that by using the condensed representation, instead of the raw data, acceptably accurate model training is possible.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification
