OSM-tree: A Sortedness-Aware Index
Aneesh Raman, Subhadeep Sarkar, Matthaios Olma, Manos Athanassoulis

TL;DR
The paper introduces the OSM-tree, a new index structure optimized for data ingestion efficiency by leveraging partial sortedness, achieving significant performance improvements over existing indexes while maintaining competitive query performance.
Contribution
The paper proposes the OSM-tree, a novel index that combines multiple techniques to optimize ingestion in near-sorted data, outperforming state-of-the-art indexes in both ingestion and query performance.
Findings
OSM-tree outperforms existing indexes by up to 8.8x in ingestion performance.
OSM-tree maintains competitive query performance, with benefits up to 5x for mixed workloads.
The design effectively leverages partial data sortedness to optimize index construction.
Abstract
Indexes facilitate efficient querying when the selection predicate is on an indexed key. As a result, when loading data, if we anticipate future selective (point or range) queries, we typically maintain an index that is gradually populated as new data is ingested. In that respect, indexing can be perceived as the process of adding structure to an incoming, otherwise unsorted, data collection. The process of adding structure comes at a cost, as instead of simply appending incoming data, every new entry is inserted into the index. If the data ingestion order matches the indexed attribute order, the ingestion cost is entirely redundant and can be avoided (e.g., via bulk loading in a B+-tree). However, state-of-the-art index designs do not benefit when data is ingested in an order that is close to being sorted but not fully sorted. In this paper, we study how indexes can benefit from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Caching and Content Delivery
