PCA of probability measures: Sparse and Dense sampling regimes
Gachon Erell, J\'er\'emie Bigot, Elsa Cazelles

TL;DR
This paper investigates PCA on probability measures using Hilbert space embeddings, deriving convergence rates in a double asymptotic regime and identifying a transition from sparse to dense sampling regimes.
Contribution
It introduces a theoretical framework for PCA on multiple measures with sample size analysis and establishes minimax optimal rates in the dense regime.
Findings
Convergence rates of $n^{-1/2} + m^{-eta}$ for covariance and PCA risk.
Identification of a sparse-to-dense transition in sampling regimes.
Validation of theoretical rates through numerical experiments.
Abstract
A common approach to perform PCA on probability measures is to embed them into a Hilbert space where standard functional PCA techniques apply. While convergence rates for estimating the embedding of a single measure from samples are well understood, the literature has not addressed the setting involving multiple measures. In this paper, we study PCA in a double asymptotic regime where probability measures are observed, each through samples. We derive convergence rates of the form for the empirical covariance operator and the PCA excess risk, where depends on the chosen embedding. This characterizes the relationship between the number of measures and the number of samples per measure, revealing a sparse (small ) to dense (large ) transition in the convergence behavior. Moreover, we prove that the dense-regime rate is minimax…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
