Data Mining the SDSS SkyServer Database
Jim Gray, Alex S. Szalay, Ani R. Thakar, Peter Z. Kunszt, Christopher, Stoughton, Don Slutz, Jan vandenBerg

TL;DR
This paper details the design, implementation, and performance of a database system supporting interactive queries and data access for the Sloan Digital Sky Survey, enabling efficient exploration of large astronomical datasets.
Contribution
It presents the database architecture, data loading pipeline, and query performance analysis for SDSS SkyServer, demonstrating effective support for scientific data exploration.
Findings
Most queries execute in under 20 seconds
The system supports interactive exploration of large datasets
SQL queries are optimized for rapid response
Abstract
An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper Szalay et. al. "The SDSS SkyServer, Public Access to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Visualization and Analytics
