Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems
Chen Luo, Michael J. Carey

TL;DR
This paper introduces new techniques for improving data ingestion and query processing in LSM-based storage systems, especially for secondary indexes, demonstrated through implementation in Apache AsterixDB.
Contribution
It proposes novel optimizations and maintenance strategies for LSM-based secondary indexes, enhancing their efficiency for general-purpose storage systems.
Findings
Significant improvement in batched point lookup performance.
Enhanced applicability of secondary indexes in LSM systems.
Validated techniques through experiments in Apache AsterixDB.
Abstract
In recent years, the Log Structured Merge (LSM) tree has been widely adopted by NoSQL and NewSQL systems for its superior write performance. Despite its popularity, however, most existing work has focused on LSM-based key-value stores with only a primary LSM-tree index; auxiliary structures, which are critical for supporting ad-hoc queries, have received much less attention. In this paper, we focus on efficient data ingestion and query processing for general-purpose LSM-based storage systems. We first propose and evaluate a series of optimizations for efficient batched point lookups, significantly improving the range of applicability of LSM-based secondary indexes. We then present several new and efficient maintenance strategies for LSM-based storage systems. Finally, we have implemented and experimentally evaluated the proposed techniques in the context of the Apache AsterixDB system,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Advanced Data Storage Technologies
