Subjects classification from high-dimensional and small-sample size datasets using a strategy based on Clustering Variables around Latent Components (CLV) method
Dimitri Marques Abramov

TL;DR
This paper introduces a CLV-based clustering method for classifying subjects in high-dimensional, small-sample datasets, achieving 80-95% accuracy by recovering latent factors.
Contribution
The study presents a novel CLV-based approach tailored for small samples in high-dimensional data, improving classification accuracy over existing methods.
Findings
Achieved 80-95% classification agreement.
Positive correlation between classifier precision and variable-to-subject ratio.
Method effectively recovers latent factors for subject classification.
Abstract
High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on clustering variables around latent variables (CLV) to subject classification in two presumed groups. For it, a predictive model was developed to generate datasets with two groups of cases whose variables show randomness features (up to 30% of variables manifest difference between groups, and up to 7% of those are correlated between them). The method recovered the information of the latent factors to classify the subjects with 80 to 95% of agreement, with positive relationship between the classifier precision and the rate [number of variables / number of subjects].
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsE-commerce and Technology Innovations · Technology and Data Analysis
