Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID
Yu Cheng, Florin Rusu

TL;DR
This paper formalizes the SS-DB benchmark using ArrayQL algebra, provides a reference implementation in EXTASCID, and demonstrates significant performance improvements over SciDB for scientific data processing tasks.
Contribution
It introduces the first formal ArrayQL-based representation of the SS-DB benchmark and evaluates EXTASCID's superior performance in scientific data processing.
Findings
EXTASCID outperforms SciDB by an order of magnitude in key operations.
Formal ArrayQL representation clarifies the SS-DB benchmark specifications.
EXTASCID supports native array and relational data processing with extensible user code.
Abstract
Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Advanced Database Systems and Queries · Distributed and Parallel Computing Systems
