MESSI: In-Memory Data Series Indexing

Botao Peng; Panagiota Fatourou; Themis Palpanas

arXiv:2009.00786·cs.DB·September 3, 2020·1 cites

MESSI: In-Memory Data Series Indexing

Botao Peng, Panagiota Fatourou, Themis Palpanas

PDF

Open Access

TL;DR

MESSI is a novel in-memory data series index that leverages modern hardware parallelization to enable real-time similarity search on large datasets, significantly outperforming previous methods in speed.

Contribution

This paper introduces MESSI, the first in-memory data series index optimized for modern hardware, achieving faster construction and query times for large datasets.

Findings

01

Up to 4x faster index construction

02

Up to 11x faster query answering

03

Real-time similarity search on 100GB datasets in under 75ms

Abstract

Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-core and multi-socket architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. Our experiments with synthetic and real datasets demonstrate that overall MESSI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Data Management and Algorithms · Music and Audio Processing