Determining the Optimal Number of Clusters for Time Series Datasets with   Symbolic Pattern Forest

Md Nishat Raihan

arXiv:2310.00820·cs.LG·October 3, 2023

Determining the Optimal Number of Clusters for Time Series Datasets with Symbolic Pattern Forest

Md Nishat Raihan

PDF

Open Access

TL;DR

This paper extends the Symbolic Pattern Forest algorithm to automatically determine the optimal number of clusters in time series datasets using the Silhouette Coefficient, improving clustering quality without ground truth labels.

Contribution

It introduces a method to select the optimal number of clusters for SPF in time series data by leveraging Silhouette scores on SAX-based feature vectors.

Findings

01

Significant improvement over baseline clustering methods

02

Effective automatic determination of cluster number

03

Validated on UCR archive datasets

Abstract

Clustering algorithms are among the most widely used data mining methods due to their exploratory power and being an initial preprocessing step that paves the way for other techniques. But the problem of calculating the optimal number of clusters (say k) is one of the significant challenges for such methods. The most widely used clustering algorithms like k-means and k-shape in time series data mining also need the ground truth for the number of clusters that need to be generated. In this work, we extended the Symbolic Pattern Forest algorithm, another time series clustering algorithm, to determine the optimal number of clusters for the time series datasets. We used SPF to generate the clusters from the datasets and chose the optimal number of clusters based on the Silhouette Coefficient, a metric used to calculate the goodness of a clustering technique. Silhouette was calculated on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Complex Network Analysis Techniques · Advanced Clustering Algorithms Research