Columnar Formats for Schemaless LSM-based Document Stores
Wail Y. Alkowaileet, Michael J. Carey

TL;DR
This paper introduces novel columnar storage techniques for LSM-based document stores, significantly enhancing query performance while maintaining ingestion efficiency.
Contribution
It extends the Dremel format for document stores and proposes new columnar layouts tailored to LSM-based systems, enabling efficient analytical queries.
Findings
Query execution time improved by orders of magnitude
Minimal impact on data ingestion performance
Effective storage and query performance improvements
Abstract
In the last decade, document store database systems have gained more traction for storing and querying large volumes of semi-structured data. However, the flexibility of the document stores' data models has limited their ability to store data in a columnar-major layout - making them less performant for analytical workloads than column store relational databases. In this paper, we propose several techniques based on piggy-backing on Log-Structured Merge (LSM) tree events and tailored to document stores to store document data in a columnar layout. We first extend the Dremel format, a popular on-disk columnar format for semi-structured data, to comply with document stores' flexible data model. We then introduce two columnar layouts for organizing and storing data in LSM-based storage. We also highlight the potential of using query compilation techniques for document stores, where values'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Advanced Data Storage Technologies
