Multi-Tree Methods for Statistics on Very Large Datasets in Astronomy
Alexander G. Gray, Andrew W. Moore (CS CMU), Robert C. Nichol, (Physics, CMU), Andrew J. Connolly (Pitt), Christopher Genovese, Larry, Wasserman (Stats, CMU)

TL;DR
This paper introduces multi-tree algorithms based on computational geometry that significantly accelerate statistical methods like kernel density estimation and n-point correlation functions, enabling analysis of millions of data points on standard desktops.
Contribution
The paper presents novel multi-tree algorithms that reduce computational complexity for key statistical methods in astronomy, allowing large datasets to be processed efficiently.
Findings
Orders of magnitude speedup over previous methods
Enables analysis of millions of data points on desktops
Applicable to kernel density estimation and n-point correlation functions
Abstract
Many fundamental statistical methods have become critical tools for scientific data analysis yet do not scale tractably to modern large datasets. This paper will describe very recent algorithms based on computational geometry which have dramatically reduced the computational complexity of 1) kernel density estimation (which also extends to nonparametric regression, classification, and clustering), and 2) the n-point correlation function for arbitrary n. These new multi-tree methods typically yield orders of magnitude in speedup over the previous state of the art for similar accuracy, making millions of data points tractable on desktop workstations for the first time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R · Computational Physics and Python Applications · Soil Geostatistics and Mapping
