Finding Representative Points in Multivariate Data Using PCA

Ashwinkumar Ganesan; Tim Oates; Matt Schmill

arXiv:1610.05819·cs.IR·October 20, 2016·1 cites

Finding Representative Points in Multivariate Data Using PCA

Ashwinkumar Ganesan, Tim Oates, Matt Schmill

PDF

Open Access

TL;DR

This paper introduces a PCA-based method to identify a minimal set of representative points in multivariate data, improving data summarization and analysis efficiency.

Contribution

It presents a novel approach to isolate and generate a minimal set of representative data points using PCA, validated on environmental data.

Findings

01

PCA effectively reduces data dimensionality for representative point selection

02

The method outperforms random sampling in capturing data variability

03

Validated on GLOBE dataset with consistent results

Abstract

The idea of representation has been used in various fields of study from data analysis to political science. In this paper, we define representativeness and describe a method to isolate data points that can represent the entire data set. Also, we show how the minimum set of representative data points can be generated. We use data from GLOBE (a project to study the effects on Land Change based on a set of parameters that include temperature, forest cover, human population, atmospheric parameters and many other variables) to test & validate the algorithm. Principal Component Analysis (PCA) is used to reduce the dimensions of the multivariate data set, so that the representative points can be generated efficiently and its Representativeness has been compared against Random Sampling of points from the data set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Applications · Advanced Statistical Methods and Models