Fast Data Series Indexing for In-Memory Data

Botao Peng; Panagiota Fatourou; Themis Palpanas

arXiv:2110.07519·cs.DB·October 15, 2021

Fast Data Series Indexing for In-Memory Data

Botao Peng, Panagiota Fatourou, Themis Palpanas

PDF

Open Access

TL;DR

MESSI is a novel in-memory data series index that leverages modern hardware parallelization to enable real-time similarity search on large datasets, significantly outperforming previous methods.

Contribution

It introduces MESSI, the first in-memory index for data series that utilizes SIMD and multi-core architectures for fast similarity search.

Findings

01

Up to 4x faster index construction

02

Up to 11x faster query answering

03

Enables real-time similarity search on 100GB datasets

Abstract

Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and Dynamic Time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Music and Audio Processing · Advanced Text Analysis Techniques