Robust and Efficient Sorting with Offset-Value Coding
Thanh Do, Goetz Graefe

TL;DR
This paper presents optimized implementations of offset-value coding for sorting and searching, demonstrating significant performance benefits in database systems through experimental evaluation.
Contribution
It introduces efficient, simple techniques for fast comparisons and sorting, highlighting their relationships with compression and stream merging in database systems.
Findings
Improved sorting performance in database systems
Effective use of offset-value coding with prefix truncation
Scalability demonstrated in Google's Napa and F1 systems
Abstract
Sorting and searching are large parts of database query processing, e.g., in the forms of index creation, index maintenance, and index lookup; and comparing pairs of keys is a substantial part of the effort in sorting and searching. We have worked on simple, efficient implementations of decades-old, neglected, effective techniques for fast comparisons and fast sorting, in particular offset-value coding. In the process, we happened upon its mutually beneficial relationship with prefix truncation in run files as well as the duality of compression techniques in row- and column-format storage structures, namely prefix truncation and run-length encoding of leading key columns. We also found a beneficial relationship with consumers of sorted streams, e.g., merging parallel streams, in-stream aggregation, and merge join. We report on our implementation in the context of Google's Napa and F1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Algorithms and Data Compression
