Robust and Efficient Sorting with Offset-Value Coding

Thanh Do; Goetz Graefe

arXiv:2209.08420·cs.DB·September 20, 2022

Robust and Efficient Sorting with Offset-Value Coding

Thanh Do, Goetz Graefe

PDF

Open Access

TL;DR

This paper presents optimized implementations of offset-value coding for sorting and searching, demonstrating significant performance benefits in database systems through experimental evaluation.

Contribution

It introduces efficient, simple techniques for fast comparisons and sorting, highlighting their relationships with compression and stream merging in database systems.

Findings

01

Improved sorting performance in database systems

02

Effective use of offset-value coding with prefix truncation

03

Scalability demonstrated in Google's Napa and F1 systems

Abstract

Sorting and searching are large parts of database query processing, e.g., in the forms of index creation, index maintenance, and index lookup; and comparing pairs of keys is a substantial part of the effort in sorting and searching. We have worked on simple, efficient implementations of decades-old, neglected, effective techniques for fast comparisons and fast sorting, in particular offset-value coding. In the process, we happened upon its mutually beneficial relationship with prefix truncation in run files as well as the duality of compression techniques in row- and column-format storage structures, namely prefix truncation and run-length encoding of leading key columns. We also found a beneficial relationship with consumers of sorted streams, e.g., merging parallel streams, in-stream aggregation, and merge join. We report on our implementation in the context of Google's Napa and F1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Algorithms and Data Compression