Fast and Exact Similarity Search in less than a Blink of an Eye
Patrick Sch\"afer, Jakob Brand, Ulf Leser, Botao Peng, Themis, Palpanas

TL;DR
This paper introduces SOFA, a novel index for fast, exact similarity search in data series, leveraging Fourier-based summarization and a tree structure to outperform existing methods, especially on high-frequency signals.
Contribution
The paper presents SOFA, combining Symbolic Fourier Approximation with a tree index for efficient, exact similarity queries, significantly improving performance on high-frequency data.
Findings
SOFA is up to 10 times faster than sequential scan.
It outperforms FAISS and MESSI in speed.
Achieves 38-fold improvement on high-frequency datasets.
Abstract
Similarity search is a fundamental operation for analyzing data series (DS), which are ordered sequences of real values. To enhance efficiency, summarization techniques are employed that reduce the dimensionality of DS. SAX-based approaches are the state-of-the-art for exact similarity queries, but their performance degrades for high-frequency signals, such as noisy data, or for high-frequency DS. In this work, we present the SymbOlic Fourier Approximation index (SOFA), which implements fast, exact similarity queries. SOFA is based on two building blocks: a tree index (inspired by MESSI) and the SFA symbolic summarization. It makes use of a learned summarization method called Symbolic Fourier Approximation (SFA), which is based on the Fourier transform and utilizes a data-adaptive quantization of the frequency domain. To better capture relevant information in high-frequency signals, SFA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
