Intrinsic dimension estimation of data by principal component analysis

Mingyu Fan; Nannan Gu; Hong Qiao; Bo Zhang

arXiv:1002.2050·cs.CV·February 11, 2010·24 cites

Intrinsic dimension estimation of data by principal component analysis

Mingyu Fan, Nannan Gu, Hong Qiao, Bo Zhang

PDF

Open Access

TL;DR

This paper introduces a novel PCA-based approach for estimating the intrinsic dimension of data with nonlinear structures, utilizing local PCA on data covers to improve accuracy and noise filtering.

Contribution

A new PCA-based method for intrinsic dimension estimation that handles nonlinear data structures and supports incremental learning.

Findings

01

Effective on synthetic and real data sets

02

Filters out noise and converges with larger neighborhoods

03

Works incrementally on large data sets

Abstract

Estimating intrinsic dimensionality of data is a classic problem in pattern recognition and statistics. Principal Component Analysis (PCA) is a powerful tool in discovering dimensionality of data sets with a linear structure; it, however, becomes ineffective when data have a nonlinear structure. In this paper, we propose a new PCA-based method to estimate intrinsic dimension of data with nonlinear structures. Our method works by first finding a minimal cover of the data set, then performing PCA locally on each subset in the cover and finally giving the estimation result by checking up the data variance on all small neighborhood regions. The proposed method utilizes the whole data set to estimate its intrinsic dimension and is convenient for incremental learning. In addition, our new PCA procedure can filter out noise in data and converge to a stable estimation with the neighborhood…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Chaos control and synchronization · Statistical and numerical algorithms