Hercules Against Data Series Similarity Search
Karima Echihabi, Panagiota Fatourou, Kostas Zoumpatianos, Themis, Palpanas, Houda Benbrahim

TL;DR
Hercules is a novel parallel tree-based indexing method for exact similarity search in large disk-based data series collections, leveraging modern hardware features for improved performance.
Contribution
The paper introduces Hercules, a new index construction and query algorithm that outperforms existing methods in speed and robustness for data series similarity search.
Findings
Hercules is up to ten times faster than competitors.
It outperforms optimized scans across all tested scenarios.
Demonstrates robustness on synthetic and real datasets.
Abstract
We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perform CPU-intensive calculations. We demonstrate the superiority and robustness of Hercules with an extensive experimental evaluation against state-of-the-art techniques, using many synthetic and real datasets, and query workloads of varying difficulty. The results show that Hercules performs up to one order of magnitude faster than the best competitor (which is not always the same). Moreover, Hercules is the only index that outperforms the optimized scan on all scenarios, including the hard query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
