Probabilistic $K$-mean with local alignment for clustering and motif discovery in functional data
Marzia A. Cremona, Francesca Chiaromonte

TL;DR
This paper introduces a probabilistic $K$-mean clustering method with local alignment for functional data, enabling the discovery of recurring local motifs and shapes within curves, with applications across biological and epidemiological datasets.
Contribution
It presents a novel clustering and motif discovery approach that combines functional data analysis, bioinformatics, and fuzzy clustering, allowing for flexible, local shape-based analysis of curves.
Findings
Effective in identifying local motifs in simulated data
Generalizes existing functional data clustering methods
Successfully applied to real biological and epidemiological data
Abstract
We develop a new method to locally cluster curves and discover functional motifs, i.e.~typical ``shapes'' that may recur several times along and across the curves capturing important local characteristics. In order to identify these shared curve portions, our method leverages ideas from functional data analysis (joint clustering and alignment of curves), bioinformatics (local alignment through the extension of high similarity seeds) and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical ``shape''). It can employ various dissimilarity measures and incorporate derivatives in the discovery process, thus exploiting complex facets of shapes. We demonstrate the performance of our method with an extensive simulation study, and show how it generalizes other clustering methods for functional data. Finally, we provide real data applications to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Advanced Clustering Algorithms Research
