On Clustering Time Series Using Euclidean Distance and Pearson Correlation
Michael R. Berthold, Frank H\"oppner

TL;DR
This paper reveals that z-score normalized Euclidean distance is mathematically equivalent to Pearson correlation distance for time series, impacting clustering methods like k-Means and providing theoretical insights and experimental validation.
Contribution
It establishes the equivalence between normalized Euclidean distance and Pearson correlation distance, and discusses necessary modifications to k-Means for proper correlation-based clustering.
Findings
Normalized Euclidean distance equals Pearson correlation distance.
Standard k-Means often yields similar clustering results.
Theoretical and experimental validation of the equivalence.
Abstract
For time series comparisons, it has often been observed that z-score normalized Euclidean distances far outperform the unnormalized variant. In this paper we show that a z-score normalized, squared Euclidean Distance is, in fact, equal to a distance based on Pearson Correlation. This has profound impact on many distance-based classification or clustering methods. In addition to this theoretically sound result we also show that the often used k-Means algorithm formally needs a mod ification to keep the interpretation as Pearson correlation strictly valid. Experimental results demonstrate that in many cases the standard k-Means algorithm generally produces the same results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications · Anomaly Detection Techniques and Applications
