Kernel Density Estimators in Large Dimensions
Giulio Biroli, Marc M\'ezard

TL;DR
This paper investigates the behavior of Kernel Density Estimators in high-dimensional data, revealing phase transitions and regimes where classical CLT assumptions break down, especially at small bandwidths, with implications for optimal bandwidth selection.
Contribution
It introduces a new high-dimensional regime characterized by a fixed ratio of log n to dimension, revealing phase transitions and heavy-tailed distributions in kernel density estimates.
Findings
Identifies three distinct statistical regimes based on bandwidth size.
Shows breakdown of CLT and emergence of heavy-tailed distributions at small bandwidths.
Provides analysis for high-dimensional Gaussian data and implications for bandwidth optimization.
Abstract
This paper studies Kernel Density Estimation for a high-dimensional distribution . Traditional approaches have focused on the limit of large number of data points and fixed dimension . We analyze instead the regime where both the number of data points and their dimensionality grow with a fixed ratio . Our study reveals three distinct statistical regimes for the kernel-based estimate of the density , depending on the bandwidth : a classical regime for large bandwidth where the Central Limit Theorem (CLT) holds, which is akin to the one found in traditional approaches. Below a certain value of the bandwidth, , we find that the CLT breaks down. The statistics of for a fixed drawn from is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
