Similarity measure for sparse time course data based on Gaussian processes
Zijing Liu, Mauricio Barahona

TL;DR
This paper introduces a Gaussian process-based similarity measure for sparse, noisy time course data, improving clustering performance in biological datasets like gene transcriptomics.
Contribution
The paper presents a novel GP-based similarity measure that enhances robustness to noise and is theoretically linked to Euclidean distance under certain conditions.
Findings
GP similarity outperforms traditional measures in noisy, sparse data
Improved clustering results on synthetic and real biological data
Equivalent to Euclidean distance when noise variance is low
Abstract
We propose a similarity measure for sparsely sampled time course data in the form of a log-likelihood ratio of Gaussian processes (GP). The proposed GP similarity is similar to a Bayes factor and provides enhanced robustness to noise in sparse time series, such as those found in various biological settings, e.g., gene transcriptomics. We show that the GP measure is equivalent to the Euclidean distance when the noise variance in the GP is negligible compared to the noise variance of the signal. Our numerical experiments on both synthetic and real data show improved performance of the GP similarity when used in conjunction with two distance-based clustering methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Gaussian Processes and Bayesian Inference · Anomaly Detection Techniques and Applications
