Batch is back: CasJobs, serving multi-TB data on the Web

William OMullane; Nolan Li; Maria Nieto-Santisteban; Alex Szalay; Ani; Thakar; Jim Gray

arXiv:cs/0502072·cs.DC·May 23, 2007·6 cites

Batch is back: CasJobs, serving multi-TB data on the Web

William OMullane, Nolan Li, Maria Nieto-Santisteban, Alex Szalay, Ani, Thakar, Jim Gray

PDF

Open Access

TL;DR

The paper presents CasJobs, a multi-server batch processing system for SDSS data that handles large, complex queries efficiently and provides users with personal databases for local analysis and collaboration.

Contribution

Introduction of CasJobs, a novel multi-queue batch system with personal databases to improve large query handling and data transfer in astronomical databases.

Findings

01

Supports multi-TB data queries efficiently

02

Enables users to store and analyze results locally

03

Improves system responsiveness during complex queries

Abstract

The Sloan Digital Sky Survey (SDSS) science database describes over 140 million objects and is over 1.5 TB in size. The SDSS Catalog Archive Server (CAS) provides several levels of query interface to the SDSS data via the SkyServer website. Most queries execute in seconds or minutes. However, some queries can take hours or days, either because they require non-index scans of the largest tables, or because they request very large result sets, or because they represent very complex aggregations of the data. These "monster queries" not only take a long time, they also affect response times for everyone else - one or more of them can clog the entire system. To ameliorate this problem, we developed a multi-server multi-queue batch job submission and tracking system for the CAS called CasJobs. The transfer of very large result sets from queries over the network is another serious problem.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management