SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases
L\'aszl\'o Dobos, Tam\'as Budav\'ari, Nolan Li, Alexander S. Szalay, and Istv\'an Csabai

TL;DR
SkyQuery introduces a parallel, probabilistic engine for efficient cross-identification of celestial objects across multiple large astronomical catalogs, leveraging SQL extensions and distributed computing.
Contribution
The paper presents a novel parallel system that performs Bayesian probabilistic cross-identification of multiple astronomical catalogs using extended SQL queries on commodity server clusters.
Findings
Handles billions of objects efficiently
Uses Bayesian probabilistic algorithms for accurate identification
Employs parallel processing with SQL extensions
Abstract
Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while the ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. The varying statistical error of position measurements, moving and extended objects, and other physical properties make it necessary to perform the cross-identification using a mathematically correct, proper Bayesian probabilistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Time Series Analysis and Forecasting · Bayesian Modeling and Causal Inference
