Fast Data Series Indexing for In-Memory Data
Botao Peng, Panagiota Fatourou, Themis Palpanas

TL;DR
MESSI is a novel in-memory data series index that leverages modern hardware parallelization to enable real-time similarity search on large datasets, significantly outperforming previous methods.
Contribution
It introduces MESSI, the first in-memory index for data series that utilizes SIMD and multi-core architectures for fast similarity search.
Findings
Up to 4x faster index construction
Up to 11x faster query answering
Enables real-time similarity search on 100GB datasets
Abstract
Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and Dynamic Time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Music and Audio Processing · Advanced Text Analysis Techniques
