Designing a Multi-petabyte Database for LSST
Jacek Becla, Andrew Hanushevsky, Sergei Nikolaev, Ghaleb Abdulla, Alex, Szalay, Maria Nieto-Santisteban, Ani Thakar, Jim Gray

TL;DR
This paper discusses the design and evaluation of a multi-petabyte database system tailored for the Large Synoptic Survey Telescope (LSST), addressing its high data volume, real-time processing needs, and complex access patterns.
Contribution
It introduces a novel database architecture optimized for LSST's massive, high-velocity data, and evaluates various database technologies against these demanding requirements.
Findings
Database systems evaluated for performance and scalability.
Proposed architecture meets real-time alerting and data growth needs.
Identified key challenges and solutions for large-scale astronomical data management.
Abstract
The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are being evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
