State-Space Dynamics Distance for Clustering Sequential Data

Dar\'io Garc\'ia-Garc\'ia; Emilio Parrado-Hern\'andez; Fernando; D\'iaz-de-Mar\'ia

arXiv:1004.1982·cs.LG·April 13, 2010

State-Space Dynamics Distance for Clustering Sequential Data

Dar\'io Garc\'ia-Garc\'ia, Emilio Parrado-Hern\'andez, Fernando, D\'iaz-de-Mar\'ia

PDF

Open Access

TL;DR

This paper introduces a new similarity measure for clustering sequential data by constructing a shared state-space and comparing transition matrices, improving over existing methods in scalability and overfitting.

Contribution

It presents a novel state-space based distance measure that reduces overfitting and enhances scalability in sequence clustering tasks.

Findings

01

Effective on synthetic datasets

02

Outperforms existing methods in real-world data

03

Reduces overfitting and improves scalability

Abstract

This paper proposes a novel similarity measure for clustering sequential data. We first construct a common state-space by training a single probabilistic model with all the sequences in order to get a unified representation for the dataset. Then, distances are obtained attending to the transition matrices induced by each sequence in that state-space. This approach solves some of the usual overfitting and scalability issues of the existing semi-parametric techniques, that rely on training a model for each sequence. Empirical studies on both synthetic and real-world datasets illustrate the advantages of the proposed similarity measure for clustering sequences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Gaussian Processes and Bayesian Inference · Anomaly Detection Techniques and Applications