Coconut: a scalable bottom-up approach for building data series indexes
Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas

TL;DR
Coconut introduces a scalable bottom-up indexing method for massive data series datasets, leveraging a novel sortable summarization and median-based splitting to improve performance and storage efficiency.
Contribution
It presents a new sortable summarization technique and median-based splitting policies that enable efficient bulk-loading of data series indexes at scale.
Findings
Coconut achieves faster index construction than existing methods.
It reduces storage costs significantly compared to state-of-the-art indexes.
Coconut improves query speed through better data organization.
Abstract
Many modern applications produce massive amounts of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing cannot be sorted while keeping similar data series close to each other in the sorted order. This leads to two design problems. First, traditional bulk-loading algorithms based on sorting cannot be used. Instead, index construction takes place through slow top-down insertions, which create a non-contiguous index that results in many random I/Os. Second, data series cannot be sorted and split across nodes evenly based on their median value; thus, most leaf nodes are in practice nearly empty.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Management and Algorithms · Spectroscopy and Chemometric Analyses
