ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees
Karima Echihabi, Theophanis Tsandilas, Anna Gogolou, Anastasia, Bezerianos, Themis Palpanas

TL;DR
ProS introduces a probabilistic, learning-based approach for progressive k-NN similarity search and classification on data series, providing quality guarantees and improving efficiency and accuracy over existing methods.
Contribution
It presents the first practical probabilistic method for progressive NN search and classification on large-scale data series with quality guarantees.
Findings
Significantly outperforms competing approaches in experiments.
Provides initial and progressive estimates that improve over time.
Supports Euclidean and DTW distance measures with effective stopping criteria.
Abstract
Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Anomaly Detection Techniques and Applications · Data Management and Algorithms
