TL;DR
This paper introduces two new data-driven symbolic representations for time series that improve upon traditional SAX by reducing information loss and enhancing anomaly detection, verified through theoretical analysis and experiments.
Contribution
It proposes two novel SAX-based symbolic representations using kernel density estimation with Lloyd-Max quantization and Mean-Shift clustering, addressing limitations of Gaussian assumptions.
Findings
Outperforms traditional SAX in real-world datasets
Enhances anomaly detection capabilities
Reduces information loss compared to existing methods
Abstract
Due to the importance of the lower bounding distances and the attractiveness of symbolic representations, the family of symbolic aggregate approximations (SAX) has been used extensively for encoding time series data. However, typical SAX-based methods rely on two restrictive assumptions; the Gaussian distribution and equiprobable symbols. This paper proposes two novel data-driven SAX-based symbolic representations, distinguished by their discretization steps. The first representation, oriented for general data compaction and indexing scenarios, is based on the combination of kernel density estimation and Lloyd-Max quantization to minimize the information loss and mean squared error in the discretization step. The second method, oriented for high-level mining tasks, employs the Mean-Shift clustering method and is shown to enhance anomaly detection in the lower-dimensional space. Besides,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
