Modelling-based experiment retrieval: A case study with gene expression clustering
Paul Blomstedt, Ritabrata Dutta, Sohan Seth, Alvis Brazma, Samuel, Kaski

TL;DR
This paper introduces a scalable, probabilistic model-based method for retrieving gene expression experiments by clustering and denoising data, improving relevance over traditional profile-based approaches.
Contribution
It proposes a novel retrieval framework using denoised models and clustering, enhancing accuracy and scalability for gene expression experiment retrieval.
Findings
Denoising improves retrieval accuracy in noisy datasets.
Heuristic clustering approximates full probabilistic inference effectively.
Method is scalable and compatible with standard software tools.
Abstract
Motivation: Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case vs. control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. Results: We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
