Principal component analysis for high-dimensional compositional data
Jingru Zhang, Wei Lin

TL;DR
This paper develops a novel method for principal component analysis tailored to high-dimensional compositional data, addressing the challenges posed by latent basis variables and the simplex constraint, with theoretical guarantees and practical algorithms.
Contribution
It establishes a link between the principal subspace of compositional data and the basis covariance, providing identifiable estimation under high-dimensional settings and proposing efficient algorithms.
Findings
Proposed methods achieve near-oracle performance in simulations.
Derived nonasymptotic error bounds demonstrating the method's accuracy.
Applied to analyze word usage patterns among statisticians.
Abstract
Dimension reduction for high-dimensional compositional data plays an important role in many fields, where the principal component analysis of the basis covariance matrix is of scientific interest. In practice, however, the basis variables are latent and rarely observed, and standard techniques of principal component analysis are inadequate for compositional data because of the simplex constraint. To address the challenging problem, we relate the principal subspace of the centered log-ratio compositional covariance to that of the basis covariance, and prove that the latter is approximately identifiable with the diverging dimensionality under some subspace sparsity assumption. The interesting blessing-of-dimensionality phenomenon enables us to propose the principal subspace estimation methods by using the sample centered log-ratio covariance. We also derive nonasymptotic error bounds for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Hydrocarbon exploration and reservoir analysis · Dental Radiography and Imaging
